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CHAPTER  1 

Introduction 

Improvements  in  semiconductor  processing  have  actually 
accelerated  the  complexity  of  VLSI  chips  which  potentially  have  hun¬ 
dreds  of  thousands  of  transistors.  To  deal  with  this  complexity  two 
concepts  are  generally  applied:  decomposition,  which  is  the  process 
of  breaking  a  problem  into  manageable  pieces,  and  abstraction,  which 
is  the  technique  of  hiding  unnecessary  detail.  Applying  these  two 
principles  in  VLSI  design  results  in  a  multi-level,  hierarchical 
approach  to  the  design  of  a  complex  chip  [11.  The  set  of  design 
verification  tools  corresponding  to  the  levels  of  the  hierarchy  is 
shown  in  Fig.  1.1  [2], 

Functional  simulators  are  used  at  the  initial  design  phase  to 
verify  the  algorithms  of  the  digital  system  to  be  implemented.  Once 
the  design  meets  these  criteria  for  the  behavioral  completeness,  an 
RTL  (Register  Transfer  Level)  simulator  could  be  used  to  verify  the 
potential  implementation  of  the  structure.  With  each  RTL  module 
further  partitioned  into  low-level  logic  building  blocks,  the  logic 
design  is  validated  by  a  logic  simulator  such  as  MOSSIM  or  SALOGS 
[3,4]. 

The  gate  level  design  is  implemented  into  integrated  circuits  by 
transistors  and  associated  interconnections.  For  the  analysis  of 


small  circuit  blocks,  circuit  simulators,  such  as  SPICE2  [S],  have 
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Functional  Analysis 
*  ♦ 
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*  t 
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*  t 

Large-Scale  Circuit  Simulation 
i  ♦ 
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A  t 

Device  Simulation 

*  t 

Process  Simulation 

Fig.  1.1  The  Hierarchy  of  Design  Verification  Tools. 

proved  effective  by  providing  accurate  voltage  and  current  waveforms. 
The  larger  blocks  may  be  analyzed  in  less  detail  by  using  a  'large- 
scale'  circuit  simulator  such  as  MOTIS,  MOTIS-C,  or  MOSTAP  [6,7,8]. 

For  VLSI  design,  there  are  some  major  constraints  such  as  die 
size,  speed  and  power,  which  are  taken  into  consideration  at  each 
level,  often  forcing  a  designer  to  backtrack  when  a  constraint  cannot 
be  met  at  a  lower  level.  A  number  of  simulations  are  required  before 
the  design  is  completed.  The  cost  of  simulation  is  expensive,  espe¬ 
cially  for  circuit  simulation  which  can  accurately  predict  circuit 
performance.  As  the  size  of  the  circuit  increases,  eventually  the 


cost  of  conventional  circuit  analysis  becoaes  prohibitive.  There¬ 
fore,  the  large-scale  circuit  analysis  in  which  soae  relaxation  tech¬ 
niques  are  used,  is  developed  to  reduce  the  execution  time  and  memory 
requirements  but  still  provide  adequate  information  about  circuit 
performance . 

This  research  is  concerned  with  the  development  of  numerical 
methods  and  scheduling  techniques  for  fast  and  relatively  accurate 
time-domain  simulation  of  NOS  LSI  and  VLSI  circuits.  The  goal  is 
that  the  developed  methods  and  techniques  could  be  implemented  in  a 
simulator  which  could  be  used  as  a  design  verification  tool  for  MOS 
circuits . 

The  basic  approach  in  most  'large-scale'  circuit  simulators  is, 
firstly,  the  partitioning  of  the  circuit  into  smaller  snbcircuits, 
and  then,  the  analysis  of  these  subcircuits  in  a  certain  sequence  [8, 
9,  10,  11,  12].  By  using  analysis  sequencing  or  selective  trace 
techniques,  one  may  tahe  advantage  of  the  latency  properties  of  the 
subcircuits  in  both  time  and  space  to  reduce  the  computation  time 
[13,  14].  In  this  research,  MOS  circuits  are  decomposed  into  ’one¬ 
way'  subcircuits  in  the  DC  sense  [3,  8]  (i.e.,  the  circuit  is  assumed 
to  be  in  steady-state  with  its  capacitors  open-circuited).  In  prac¬ 
tice,  this  partitioning  approach  produces  subcircuits  of  relatively 
small  sizes  and  sparse  matrix  techniques  are  not  necessary.  By  prop¬ 
erly  ordering  the  circuit  variables,  the  circuit  equations  can  be  put 
in  an  'almost*  lower  block  triangular  form  with  the  upper  triangular 
nonzero  terms  accounting  for  any  feedback  that  might  exist  among  the 


subcircuits.  Traditionally,  the  Gansa-Seidel  method  hat  been  nsed  to 
decouple  the  feedback  teras  by  assigning  previous  values  to  the 
current  'unsolved'  variables  [8,  IS].  However,  this  approach  suffers 
froa  accuracy  probleas.  Furtheraore,  as  will  be  shown  later  in  this 
thesis,  when  floating  capacitors  exist  aaong  the  subcircuits,  the 
Gauss-Seidel  aethod  will  not  be  consistent  and  thus  not  convergent. 
A  'Modified'  Gauss-Seidel-Newton  aethod  is  then  introduced  to  solve 
the  circuit  equations  and  to  decouple  the  feedback  teras  during  the 
analysis  process.  The  proposed  technique,  which  is  based  on  the  use 
of  a  forward  predictor  to  estiaate  the  values  of  the  yet  unsolved 
variables  in  feedback  loops,  is  aore  accurate  than  the  standard 
Gauss-Seidel  aethod,  without  requiring  auch  additional  coaputation. 
This  Modified  Gauss-Seidel  aethod  is  shown  to  be  consistent,  stable 
and  convergent.  As  far  as  analysis  sequencing  is  concerned,  a  pro¬ 
cedure  is  described  in  this  dissertation,  which  is  different  froa 
ones  previously  proposed  in  that  it  schedules  only  those  'relevant* 
sabcircaits  that  directly  or  indirectly  affect  the  output.  This 
approach  is  combined  with  a  latency  technique  to  further  increase  the 
speed  of  simulation. 

An  experimental  program  PREMOS  (PREdiction-based  simulator  for 
NOS  circuits)  is  developed  to  implement  and  test  the  new  algorithms 
and  new  schemes  in  this  research.  This  program  is  mainly  for  the 
time-domain  analysis  of  VLSI  NOS  digital  circuits.  It  has  been  shown 
that  PREMOS  can  produce  simulated  results  whose  accuracy  is  close  to 
that  of  conventional  circuit  simulation,  whereas  the  computational 
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speed  is  generally  within  the  range  of  five  times  slower  than  that  of 
timing  simnlators  such  as  MOTIS-C  [6,  7].  The  speed  and  circnit-size 
capability  of  MOTIS-C  have  been  claimed  to  be  over  two  orders  of  mag¬ 
nitude  greater  than  SPICE2  [7].  Hence,  PREMOS  also  has  mnch  greater 
speed  and  circnit-size  capability  than  SPICE2. 

Chapter  2  describes  briefly  the  analysis  techniques  used  in  con¬ 
ventional  circnit  analysis  and  large-scale  circuit  analysis.  In 
Chapter  3  the  analysis  sequencing  procedures  are  explained  and  the 
idea  of  scheduling  only  'relevant'  parts  is  studied.  The  nonlinear 
DC  analysis  methods  adopted  in  the  solution  algorithms  are  discussed 
in  Chapter  4.  Chapter  5  introduces  the  modified  Gauss-Seidel  method 
and  provides  the  numerical  study  of  the  method.  In  Chapter  6, 
latency  criteria  and  a  timestep  control  scheme  are  described. 
Chapter  7  describes  the  structure  of  the  program  PREMOS  and  gives 
some  simulation  examples.  In  the  final  chapter.  Chapter  8,  the  con¬ 
clusions  are  presented  and  the  areas  for  future  work  are  pointed  out. 

There  are  five  appendices.  Appendix  1  describes  MOS  device 
modeling  and  capacitor  modeling.  Appendix  2  contains  the  input 
descriptions  for  circuit  elements  and  their  models  in  the  program 
PREMOS.  The  control  commands  and  the  data  structure  used  in  PREMOS 
are  given  in  Appendix  3  and  Appendix  4,  respectively.  Appendix  5 
contains  an  input  data  file  to  PREMOS  for  an  example  studied  in 
Chapter  7. 
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CHAPTER  2 

Large-Scale  Circuit  Simulation 


£•!•  Introduction 

The  simulation  of  LSI  and  VLSI  circuits  in  their  entirety  falls 
beyond  the  capability  of  conventional  circuit  simulators.  On  the 
other  hand,  conventional  logic  simulators  can  only  give  the  results 
in  terms  of  logic  levels  "l",  " 0 "  or  "unknown"  with  the  attendant 
loss  of  detail  in  the  waveforms.  In  recent  years,  many  techniques 
have  been  proposed  to  bridge  the  gap  between  circuit  simulation  and 
logic  simulation.  The  aim  is  to  obtain  a  circuit-level  type  simula¬ 
tion  with  computational  speeds  approaching  that  of  logic  simulation. 
The  analysis  techniques  used  in  'conventional'  circuit  simulation  and 
'large-scale'  circuit  simulation  are  described  in  Sections  2.2  and 
2.3,  respectively.  In  Section  2.4,  a  discussion  on  problem  areas 
with  the  previous  methods  is  given. 


£>2.  £oEX£flllgntl  Circuit  Analysis 

The  equations  that  describe  an  integrated  circuit  model  are  gen¬ 
erally  a  set  of  nonlinear  (stiff)  algebraic-differential  equations  of 
the  following  form*: 

*In  the  sequel  we  use  the  lower  case,  such  as  z  to  denote  a  vec¬ 
tor,  z^  the  ith  element  of  z  and  upper  case  X  a  matrix. 
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f  (x.  x.  t)  -  0.  x(0)  -  x  (2.1) 

o 

Using  an  implicit  integration  formula,  such  as  the  backward  Euler 
formula ,  the  trapezoidal  rule,  or  one  of  Gear's  formula.  (2.1)  is 
approximated  at  every  time  point  tn  by  a  set  of  nonlinear  algebraic 
equations  of  the  form: 

g  (xn)  *  0  (2.2) 

Eq.  (2.2)  is  usually  solved  by  using  a  modified  Newton's  method.  At 
every  iteration  in  the  Newton's  method,  the  linearized  equations  that 
have  to  be  solved  are  of  the  form: 


A  x  =  b  (2.3) 


A  number  of  iterations  may  be  necessary  before  the  process  con¬ 
verges  to  a  solution  of  (2.2).  At  every  iteration,  function  and 
Jacobian  evaluations  to  construct  the  matrix  A  in  (2.3),  as  well  as 
LU  decomposition  and  solution,  are  repeated.  In  practice,  the  matrix 
A  is  sparse  and  sparse  matrix  solution  techniques  can  be  used  to 
reduce  the  computational  requirements.  The  fundamental  algorithm  of 
circuit  analysis  can  be  summarized  as  follows: 

BEGIN 

BEGIN 

X  *  [  Voltages,  Currents] 

TIME  “  Start  Time 
B  ■=  Initial  Timestep 
END  (initialization) 

TIME  -  TIME  +  H 
WHILE  (TIME  <  End  Time)  DO 
BEGIN 

Discretize  the  differential  operators  by  using  an 
integration  formula 
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REPEAT 
BEGIN 
k  -  1 

Evaluate  linear  models  for  eireuit  elements  at  the 
operating  points  and  fora  the  eireuit  natrix  A 
and  the  aouree  vector  b 
Solve  linear  equations  AX  -  b 
END 

UNTIL  (convergenee  aehieved)  (dc  loop} 

IF  the  local  truncation  error  (LIE)  is  smaller  than 
the  tolerance 
THEN 
BEGIN 

Compute  new  tiaestep  H 
TINE  -  TINE  ♦  H 
END 
ELSE 
BEGIN 

TINE  -  TINE  -  H 
Compute  revised  tiaestep  H 
TINE  -  TINE  +  H 
END 

END  (time  loop} 

END 


For  large-scale  circuits,  sparse  matrix  techniques  alone  do  not 
produce  simulation  results  in  a  reasonable  time.  To  improve  the 
speed  of  computation,  tearing  or  decomposition  together  vith  latency 
detection  and  exploitation  are  used  [12,  13,  14].  Depending  on  the 
computer  algorithm  implemented  and  on  the  circuit  being  analyzed, 
decomposition  and  latency  checking  could  reduce  the  amount  of  compu¬ 
tation  two  to  five  times  [14].  In  order  to  gain  more  computational 
speed,  additional  algorithms  are  needed,  as  described  in  the  next 


section. 
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1*1  •  Large-Scale  CJifi.ttifc  AflAlytil 


The  basic  idea  in  many  large-scale  circuit  simulators  is, 
firstly,  the  partitioning  of  the  circuit  into  smaller  subcircuits, 
and  then,  the  analysis  of  these  subcircuits  in  a  certain  sequence.  A 
number  of  algorithms  have  been  proposed  for  the  simulation  of  parti¬ 
tioned  circuits  using  analysis  sequencing. 


1.1.1.  Po.jp  1  Gauss-Jacobi  Algorithm 

In  this  algorithm,  the  components  of  z  in  (2.3)  are  obtained  one 
at  a  time  by  solving  a  sequence  of  scalar  equations;  i.e.,  at  time 
tn+l,  the  kth  component  of  xn+1.  *£+1.  found  by  solving  the  scalar 
equation: 


/xn  _n  n  _  _n 

9  I  •  •  •  9 


X®) 

f  •  •  •  »  rn  / 


■k  -1'  “2*  •"  *  Xk-1'  V  Vl'  m 

Eq.  (2.4)  can  then  be  solved  using  a  Newton  method.  In 
HOTIS-C  [6,  7],  a  one-step  regula-falsi  iteration  is  used 


(2.4) 

M0T1S  and 
1163 . 


To  illustrate  this  algorithm  further,  we  partition  the  matrix  A 
in  (2.3)  into  the  form 

A  «  L  +  D  +  U  (2.5) 

where  L  and  U  are  strictly  lower  and  strictly  upper  triangular 

matrices  and  D  is  a  diagonal  matrix,  as  shown  in  Fig.  2.1.  L.  and  Dj 

stand  for  the  ith  row  of  the  triangular  matrices  L  and  U.  The  point 

Gauss-Jacobi  algorithm  is  described  as 

BEGIN 

BEGIN 

X  -  [  Voltages  ] 

TIME  -  Start  Time 
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H  ■  Initial  Tiaestep 
END  {initialization} 

TINE  -  TINE  +  H 
n  *  1 

WHILE  (TINE  <  End  Time)  DO 
BEGIN 

Discretize  the  differential  operators  by  using  an 
integration  formula 
FOR  node  i,  i-1  TO  a  DO 
BEGIN 

Evaluate  linear  aodels  for  nonlinear  devices  which 
are  fanonts  of  it h  node 
Fora  the  row  circuit  aatriz  D...  Lj  and 
and  the  current  source  bj 
Solve  linear  equation 

DiiXi  "  bi  '  LiXi  ‘  °iXi 

END  {sweep  a  nodes) 

Coapute  new  tiaestep  H 
TIME  *  TINE  +  H 
n  *  n  +  1 
END  {tine  loop} 

END 


Since  most  automatic  tiaestep  control  scheaes  are  expensive  for 
large-scale  circuit  simulation,  fixed  tiaestep  during  analysis  has 
been  used  in  some  siaulators  like  MOTIS  and  NOTIS-C. 


Z-l-Z-  Point  Gauss-Seldel  Algorithm 


In  this  algorithm,  the  Gauss-Seidel  technique  is  used  to  solve 
(2.3).  At  every  iteration,  one  solves  a  sequence  of  scalar  equations 
of  the  fora: 

,n+l 


,  n+1  n+1 

*k  U1  '  x2 


.  ...  ,  xkl,  xk,  xk+1< 


' «:> 


(2.6) 


The  above  equation  could  be  solved  by  using  Newton's 
SPLICE  (15]  only  one  iteration  is  made. 


method. 


In 
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Using  the  sane  expressions  Lj,  Oj  and  Djj  as  in  the  last  sec¬ 
tion,  point  Gauss-Seidel  algorithm  could  be  described  as  follows  : 


BEGIN 

BEGIN 

X  »  [  Voltages  ] 

TINE  “  Start  Time 
B  »  Initial  Tiaestep 
END  {initialization} 

TIME  -  TIME  +  H 
n  *  1 

WHILE  (TINE  <  End  Tine)  DO 
BEGIN 

Discretize  the  differential  operators  by  using  an 
integration  formula 
FOR  node  i ,  i*l  TO  a  DO 
BEGIN 

Evaluate  linear  models  for  nonlinear  devices  which 

are  fanouts  of  ith  node 

Form  the  row  circuit  matrix  D^,  and 

Uj,  and  the  current  source  bj 

Solve  linear  equation 


D.  .X.  *  b.  -  L.X1?*1  -  U.X? 

ii  i  i  ii  li 


END  (sweep  m  nodes} 
Compute  new  timestep  H 
TIME  -  TIME  +  H 
n  *  n  +  1 
END  {time  loop} 

END 


Bloch  Gauss-Seidel-Newton  Algojltha 

From  the  network  point  of  view,  the  point  Gauss-Jacobi  and  the 
point  Gauss-Seidel  methods  are  equivalent  to  decomposing  the  network 
at  every  node.  In  the  block  Gauss-Seidel-Newton  algorithm,  the  net¬ 
work  is  decomposed  into  subcircuits,  which  may  consist  of  more  than 
one  node.  A  Gauss-Seidel-Newton  method  is  then  used  to  solve  the 
partitioned  system  of  equations,  which  now  becomes  a  sequence  of  sub¬ 
circuit  equations,  rather  than  scalar  equations,  of  the  form: 
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a  (,n+l  _u+l 

»  *2  *  • • •  » 


.n+1 


Xk-1*  Xk*  Xk+1 


xn) 

»  •  •  •  i  *  / 


where  the  x^  is  now  a  vector.  Note  that  a  modified 


method  coaid  be  used  in  solving  each  snbcircait. 


-  0  (2.7) 

Newton-Raphson 


In  this  algorithm,  the  matrix  A  is  partitioned  into  the  form 
A  *  L'  +  D*  +  0*  (2.8) 

where  L'  and  U'  are  strictly  lower  block  and  strictly  upper  block 
triangular  matrices  and  D'  is  the  block  diagonal  matrix  (Fig.  2.2). 
Here.  L'  and  U?  represent  the  ith  block  matrices  of  the  L'  and  U'. 
respectively.  DJ.  js  the  ith  block  diagonal  matrix.  The  block 
Gauss-Seidel-Newton  algorithm  can  be  described  as  follows  : 


BEGIN 

BEGIN 

1  »  l  Voltages  ] 

TIME  -  Start  Time 
H  *  Initial  Timestep 
END  (initialisation) 

TIME  -  TIME  +  B 
n  *  1 

WHILE  (TIME  <  End  Time)  DO 
BEGIN 

Discretixe  the  differential  operators  by  using  an 
integration  formula 
FOR  subcircuit  i.  i»l  TO  m  DO 
BEGIN 
REPEAT 
BEGIN 

Evaluate  lirear  models  for  nonlinear  devices  of 
ith  subcir'.-uit 

Form  the  block  circuit  matrix  Df^,  L ■  and 

tij.  and  the  current  source  vector  bj 
Solve  linear  equations 


D'  X 
Hi 


bi  " 


L'Xn+1  - 
i  i 


D'X” 

Vi 


END 

UNTIL  (nonlinear  converged)  (dc  loop  for  ktb  subcircuit} 
END  (sweep  m  subcircuits} 

Compute  new  timestep  H 


TIME  -  TIME  +  B 
a  -  a  +  1 
END  (tiae  loop} 


2.1.1.  lavelora  jaiiaailfltt  Method 

For  the  three  algorithms  described  above,  the  circuit  aaalysis 
proceeds  by  saall  tiaesteps  at  the  global  level,  as  is  done  ia  con- 
veatioaal  circuit  siaulatioa.  In  the  vavefora  relaxation  aethod,  the 
waveforas  are  obtaiaed  for  a  tiae  interval  at  the  subcircuit  level 
sad  a  nuaber  of  vavefora  iterations  are  then  taken  to  converge  to  the 
solution  [17].  Either  the  Gauss-Jacobi  aethod  or  the  Gauss-Seidel 
aethod  could  be  used  ia  the  vavefora  relaxation  algorithm.  The 
Gauss-Seidel  vavefora  relaxation  algor itba  can  be  described  as  fol- 
lovs: 


BEGIN 

X  *  l  Voltages.  Currents  J 
a  -  1 

WHILE  (6n  <  Tolerance)  DO 
BEGIN 

FOE  subcircuit  i,  i*l  TO  a  DO 
BEGIN 

FOR  tiae  t-0  TO  t-End  Tiae  DO 
BEGIN 

Solve  nonlinear  equations 

»n+l  _  t  Tn+1  Tn+1  Tn 

i  i  1  '••,,*i-l'*i  '*i+l' 

and  X"+1(0)  -  X”(0) 


END  (sveep  a  subcircuits) 

5n+1  -  aax  aax  I  Xn+1(t)  -  Xn(t)  I 
i  t 
n  *  n  +  1 

END  [vavefora  iteration  loop) 
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The  waveform  relaxation  method  it  attractive  since  the  snbcircnits 
are  analyzed  independently  and  thns  different  tiaesteps  conld  be 
used.  However,  the  aethod  suffers  froa  convergence  problem  when 
strong  feedback  exists.  Furthermore,  a  nnaber  of  iterations  are 
needed  for  the  waveforas  to  converge  to  the  solution  and  a  large 
aeaory  is  required  to  store  the  entire  waveforas  at  each  iteration. 


1.1.  Probleas  with  lie  Previous  MfLtfefiflt 

In  general,  the  large-scale  circuit  simulation  algorithms 
described  above  have  the  following  features  in  common: 

(1)  decomposing  the  entire  circuit  into  small  subcircuits  and 
adopting  the  circuit  analysis  for  each  subcircuit  sequentially 

(2)  using  relaxation  methods  in  solving  the  circuits 

(3)  using  simplified  device  models  for  circuit  elements. 

It  is  obvious  that  there  are  some  tradeoffs  between  the  speed  of 
simulation  and  the  accuracy  of  the  simulated  results,  which  depend 
upon  the  accuracy  requirement.  The  problems  and  the  impacts  for 
large-scale  circuit  simulation  are  discussed  in  the  following  : 

(i)  circuit  decomposition 

As  the  size  of  circuit  increases,  the  time  required  to  solve  the 
circuit  equations  increases  very  fast  and  .rapidly  becomes  the  dom¬ 
inant  cost  of  the  analysis  in  conventional  circuit  analysis  [IS], 
Decomposing  the  circuit  into  small  subcircuits  and  analyzing  the  cir¬ 
cuit  at  subcircuit  level  reduces  the  computation  time  because  it  now 
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grows  almost  linearly  with  circuit  size.  As  the  average  size  of  the 
subcircuits  is  reduced,  the  growth  in  total  computation  time  becomes 
more  linear.  For  example,  the  subcircuit  size  in  [6,  7]  is  always 
one.  However,  if  the  size  of  each  subcircuit  is  forced  to  be  one, 
then  the  interactions  among  the  subcircuits  may  be  strong,  which 
could  affect  the  accuracy  of  the  one-iteration  Gauss-Jacobi  or 
Gauss-Seidel  approach.  This  problem  is  discussed  in  Chapter  4. 

(ii)  device  modeling 

There  are  generally  two  forms  for  representing  device  charac¬ 
teristic  models:  functional  form  and  tabular  form.  The  former  is 
generally  used  in  circuit  simulation,  where  nonlinear  model  equations 
and  parameters  are  employed  to  describe  the  operations  of  the  dev¬ 
ices.  The  latter  is  often  used  in  timing  simulation  and  piecewise- 
linear  analysis  methods.  The  tabular  models  could  be  in  one  dimen¬ 
sional  or  two  dimensional  form  t6,  7,  8,  18].  Depending  on  different 
requirements,  either  one  or  a  combination  of  these  two  approaches  can 
be  used  for  device  modeling.  In  Appendix  1,  both  MOS  device  modeling 
and  capacitor  modeling  for  VLSI  circuits  are  discussed. 

(iii)  nonlinear  iterations 

Most  circuit  simulation  programs  use  the  (modified)  Newton- 
Raphson  algorithm  to  determine  the  solution  of  nonlinear  system  of 
algebraic  equations.  The  criterion  for  the  convergence  of  the  itera¬ 
tive  solutions  is  the  requirement  that  the  vector  of  circuit  vari¬ 
ables  agrees  with  the  prior  solution  x^  within  a  specified 
tolerance.  For  large-scale  circuit  analysis,  it  is  rather  expensive 
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to  ase  the  seme  convergence  criterion  in  conventional  circuit  simula¬ 
tion.  In  [6,  7].  only  one  iteration  is  taken  at  each  timepoint.  In 
[8],  a  relaxation  aethod  is  nsed  to  reduce  the  nuaber  of  nonlinear 
iterations.  More  on  this  topic  is  discussed  in  Chapter  4. 

(iv)  tiaestep  control  scheae 

In  conventional  circuit  simulation,  the  tiaestep  is  usually  con¬ 
trolled  by  the  local  truncation  error  (LTE)  [5] .  For  large-scale 
circuit  simulation,  it  is  too  costly  to  use  the  LTE  tiaestep  control 
scheme.  Soae  of  the  existing  techniques  use  a  fixed  tiaestep  scheme 
[6,  7].  A  simple  variable  tiaestep  control  scheme,  where  the  inter¬ 
nal  tiaestep  changes  according  to  circuit  activity,  is  adopted  in 
[18].  In  [8],  an  iteration  count  tiaestep  control  scheae  is  used. 
In  Chapter  6,  more  exploitation  of  the  tiaestep  control  scheme  is 
done  . 

(v)  accuracy 

From  the  accuracy  point  of  view,  it  has  been  found  that  both  the 
Gauss-Jacobi  and  the  Gauss-Seidel  methods  tend  to  produce  a  response 
that  lags  behind  the  actual  response.  In  Chapter  5,  a  modified 
Gauss-Seidel  method  is  described  to  solve  the  partitioned  system  of 
equations.  It  is  shown,  by  examples,  that  the  new  aethod  is  more 
accurate  than  the  standard  Gauss-Seidel  method.  The  proposed  modi¬ 
fied  Gauss-Seidel  method  is  proved  to  be  consistent,  stable  and  con¬ 


vergent. 


CHAPTER  3 


Analysis  Sequencing 


1.1.  Introduction 

In  order  to  analyze  a  large-scale  circuit,  the  entire  circuit  is 
usually  partitioned  into  smaller  'one-way'  subcircuits  at  first,  and 
then,  these  subcircnits  are  analyzed  in  a  certain  sequence  [9].  To 
create  'one-way'  subcircuits  requires,  in  general,  the  introduction 
of  some  approximations.  For  MOS  circuits,  'one-way*  subcircuits  are 
created  by  decoupling  the  gate-to-drain  capacitance.  To  allow  the 
subcircuits  to  be  analyzed  independently  in  sequence,  a  scheduling 
scheme  is  followed.  This  scheduling  process  is  called  analysis 
sequencing . 

By  properly  defining  the  subcircuits  in  combinational  logic  cir¬ 
cuit,  the  overall  circuit  equations  can  be  ordered  into  a  lower 
block-triangular  form  (LBT)  [12],  so  that  analysis  sequencing  can  be 
applied  most  efficiently.  In  general,  when  there  is  feedback  among 
the  subcircuits,  such  as  in  sequential  circuits,  the  circuit  equa¬ 
tions  cannot  be  ordered  into  a  lower  block-triangular  form  unless  the 
sizes  of  the  subcircuits  are  increased  to  include  the  feedback  [9]. 
Alternatively,  the  (block)  Gauss-Jacobi  or  (block)  Gauss-Seidel  tech¬ 
nique  may  be  used  to  decouple  tho  equations  for  analysis  sequencing 
and  to  keep  the  sizes  of  the  subcircuits  relatively  small.  This 
decoupling  in  effect  'breaks'  the  feedback  in  the  analysis  sequencing 
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procedure . 

In  Section  3.2,  *oae  mathematical  properties  on  directed  graphs 
are  discussed.  Section  3.3  summarizes  previous  work  on  'levelizing' 
the  vertices  in  an  acyclic  directed  graph.  Two  new  algorithms  based 
on  programming  data  structures  are  developed  in  this  thesis  and 
described  in  Section  3.4;  one  algorithm  uses  a  stack  and  the  other  a 
queue.  Examples  are  shown  to  compare  the  differences  between  these 
two  algorithms.  Since  in  many  cases,  feedback  may  exist  in  the  net¬ 
work,  an  algorithm  for  checking  feedback  paths  is  studied  in  Section 
3.5.  In  addition  to  latency  and  selective  trace  (or  event  driven), 
an  important  idea  for  further  saving  of  CPU  time  and  memory  is  to 
schedule  only  those  subcircuits  that  directly  or  indirectly  affect 
the  outputs  in  the  circuit  analysis.  This  concept  of  scheduling  only 
'relevant'  subcircuits  together  with  a  corresponding  algorithm  are 
described  in  detail  in  Section  3.6.  An  algorithm  for  analysis 
sequencing  using  parallel  processing  is  proposed  in  Section  3.7.  The 
last  section.  Section  3.8,  gives  the  conclusions. 


1.2.  Mathematical  Properties 

A  circuit  which  is  composed  of  unilateral  subcircuits  can  be 
represented  by  a  directed  graph  G(V,E),  where  each  vertex  in  V 
corresponds  to  each  subcircuit  and  each  edge  in  E  corresponds  to  each 
signal  line  from  fanout  to  fanin.  The  mathematical  properties  of 


directed  graphs  can  be  found  in  many  books  on  graph  theory,  such  as 


[19,  20,  21].  To  simplify  the  description  of  the  scheduling  algo¬ 

rithms  proposed  in  this  thesis,  the  following  definitions  and  deriva¬ 
tions  are  given  : 

Definition  14  : 

Given  a  vertex  v  of  G(V,E),  the  set  of  fanin  vertices  and  fanont 
vertices  of  v  are  defined  as 

f in(v)*{  w  €  V  I  (w.v)  €  E  ) 

fout(v)*{  w  €  V  f  (v,w)  e  E  }  (3.1) 

The  nnaiber  of  fanin  and  fanont  vertices  of  v  are  defined  as  nfin(v) 
and  nfout(v),  respectively. 

The  adjacency  matrix  X  -  [x^.]  of  the  directed  graph  G(V,E)  is 
defined  as  a  n  by  n  matrix  whose  element 

Xjjsl  if  there  is  an  edge  directed  from  ith  vertex  to  jth 
vertex 

-0  otherwise. 

A. directed  graph  and  its  adjacency  matrix  are  shown  in  Fig.  3.1.  It 
is  easy  to  observe  the  following  two  properties  : 

1.  nfin(v)  is  the  som  of  the  column  of  corresponding  vertex  v. 

2.  nfout(v)  is  the  sum  of  the  row  of  corresponding  vertex  v. 
Note  that  any  set  of  parallel  directed  edges  in  G(V,E)  will  be 
treated  as  one  edge,  without  affecting  the  analysis  sequencing.  If  X 
is  the  adjacency  matrix  of  G(V,E) ,  then  the  transposed  matrix  X^  is 
the  adjacency  matrix  of  a  directed  graph  G^(V,E)  obtained  by  revers¬ 
ing  the  direction  of  every  edge  in  G(V,E) .  The  following  relation 


can  be  derived. 


iif in (t)  is  Gr(V,E)  equals  nfout(v)  in  G(V,E)  and  nfout(v)  in 


G*(V,E)  equals  nfin(v)  in  G(V,£). 

For  the  directed  graph  G(V,E)  with  no  feedback  loops  (i.e., 
G(V,E)  is  acyclic),  the  analysis  sequencing  is  to  reorder  the  rows 
(columns)  in  the  corresponding  adjacency  matrix  X  and  make  the  matrix 
X  upper  triangular.  For  illustration,  the  following  definitions  are 
made. 

2-2  : 

Vertex  v^  in  G(V,E)  is  a  predecessor  of  vertex  Vj  if  and  only  if 
there  is  a  directed  path  from  Vj  to  v j .  If  vj  is  a  predecessor  of 
vj,  then  v j  is  a  sucessor  of  Vj. 

Bsfinittgft  2.2  : 

A  linear  ordering  is  called  a  topological  order  if  it  has  the 
property  that  if  vA  is  a  predecessor  of  Vj  in  the  network,  then  vj 
precedes  Vj  iB  the  linear  ordering. 

For  an  acyclic  directed  graph,  the  analysis  sequencing  is  to 
arrange  the  vertices  in  a  topological  order  and  the  following  theorem 
is  obtained. 

Ihmpp  2-1  [19]  : 

The  vertices  in  a  directed  graph  can  be  arranged  in  a  topologi¬ 
cal  order  if  and  only  if  the  directed  graph  is  acyclic. 
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In  general,  when  feedback  loops  exist  in  the  directed  graph,  the 
graph  is  no  longer  acyclic.  In  this  case,  analysis  sequencing  pro¬ 
cedures  need  to  be  extended  to  check  for  the  feedback  paths  and  to 
schedule  the  analysis  of  the  subcircuits  in  the  proper  sequence. 


1.1.  FigY.iw?  laxX 

Over  the  past  few  years,  two  methods  have  been  proposed  for 
sequencing  the  vertices  of  general  directed  graphs  which  are  not 
necessarily  acyclic.  One  method  is  to  construct  a  new  acyclic 
directed  graph  G'  first  by  contracting  the  vertices  in  each  strongly 
connected  component  of  the  original  graph  G  into  a  new  vertex  in  G' 
[9].  Tarjan's  algorithm  [22]  could  be  used  to  find  the  strongly  con¬ 
nected  components  of  G  in  linear  time  complexity.  The  vertices  in  G' 
are  then  levelixed  and  scheduled  by  Algorithm  3.1  given  below. 

Almithp  1-1  (9] 

The  notation  nu(v^)  is  the  updated  number  of  fanin  vertices  of 
Vj  after  all  the  scheduled  vertices  have  been  removed. 

Part  I  :  (assignment  of  vertices  of  G’(V,E)  to  levels) 

BEGIN 

Assign  input  vertices  of  G’(V,E)  to  level  0 ; 
k  <-  0; 

L.  FOR  each  vertex  v  in  level  k  DO 

FOR  each  vertex  w  €  fout(v)  DO 


BEGIN 
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nfin(w)  <-  nf in(w)-l 
IF  nf  in  (w)  *0  THEN 
assign  w  to  level  k+1; 

END 

IF  level  k  it  not  empty  THEN 
GO  TO  L; 
k  4-  k+1 

*  END 

Pert  II  :  (scheduling  the  analysis  of  the  sabcircnits) 

BEGIN 

k  «-  1; 

L.  FOR  each  vertex  v  of  G'(V,E)  at  level  k  DO 

time  analysis  of  corresponding  sabcircnits; 

IF  level  k  is  non-empty  THEN 
GO  TO  L; 
k  k+1; 

END 

Algorithm  3.1  is  illustrated  by  the  following  example  : 

EaBRlR  1.1  : 

For  the  directed  graph  G’(V,E)  shown  in  Fig.  3.2,  Algorithm  3.1 


levelized  it  with  depth  8  : 
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Level  012345678 


1 

7 

13 

17  20  21 

22 

25 

27 

2 

8 

14 

18 

23 

26 

28 

3 

9 

15 

19 

24 

4 

10 

16 

5  11 

6  12 

b«tk  1'1  : 

The  principle  of  levelizing  the  vertices  is  that  a  vertex  v  is 
in  level  k  if  all  the  vertices  of  fin(v)  belong  to  levels  nnabered 
fro*  0  to  k-1 .  The  depth  of  an  acyclic  directed  graph  is  the  *axi*n* 
value  of  levels  [23]. 

&£UXk  1.2  : 

In  large-scale  circuit  analysis,  one  should  try  to  keep  the  size 
of  the  subcircuits  saall  in  order  to  make  the  total  analysis  tine 
linearly  proportional  to  the  size  of  the  entire  circuit.  However, 
the  sizes  of  the  subcircuits  (  or  vertices  )  after  contraction  in 
Algoritha  3.1  could  become  too  large  for  the  analysis  to  be  effi¬ 
cient.  For  example,  a  N-stage  ring  oscillator  would  be  contracted 
into  one  subcircuit  instead  of  N  subcircuits. 

Another  nethod  [8]  deals  with  the  directed  graph  G(V,E)  directly 
without  contracting  it  as  is  done  in  Algoritha  3.1.  Then  a  feedback 
loop  is  found,  the  loop  is  broken.  Levelizing  the  vertices  in  this 
way  is  perforned  by  using  the  following  algoritha. 

Alraitta  2.2  [61 

For  describing  the  algoritha,  the  following  notations  are 
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defined  : 

adj (v)  :  set  of  adjacent  vertices  corresponding  to  the  set  of 
the  incoming  edges  of  vertex  v. 

la(v)  :  label  of  vertex  v. 

Procedure  : 

1.  Set  la(v^)>0  for  each  vertex  Vj  of  G(V,E) 

2.  la(v^)«i  for  each  vertex  Vj  which  corresponds  to  an  input 
signal  teminal. 

k-l. 

3.  k«k+l. 

Choose  a  vertex  v^  where  la(vj)«0  and  laCvjJ^O  for  all  Vj 
€  adj ( v . ) ,  if  there  is  no  such  vertex,  choose  a  vertex  Vj  connect- 
ing  to  a  vertex  which  has  the  lowest  label.  la(Vj)«]t, 

4.  Repeat  step  (3)  until  all  the  vertices  in  G(V,E)  are  labeled. 

IgBfcfcfe  1-1  •' 

The  level  in  Algorithm  3.2  is  equivalent  to  the  label  in  Algo¬ 
rithm  3.2,  except  the  label  in  Algorithm  3.2  starts  from  one  instead 
of  zero. 

Rsaiurl  1.1  = 

It  is  claimed  in  [8]  that  Algorithm  3.2  can  find  all  feedback 
loops  which  will  be  cut  during  the  analysis  sequencing.  But,  as 
shown  in  the  following  example.  Algorithm  3.2  sometimes  fails  in 
identifying  the  proper  feedback  loops. 


I-!  : 
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For  the  directed  graph  shown  in  Fig.  3.3,  it  is  obvious  that 
(7,3)  is  the  feedback  path  that  should  be  identified.  The  correct 
sequence  for  this  directed  graph  is  as  follows  : 

Label  :  1  2  3  4 

13  4  5 

2  6  7 

But  by  using  Algorithm  3.2,  the  following  sequence  could  be  produced 


Label  :  1  2  3  4  5  6 

1  4  5  3  6  7 

2 

In  the  next  section,  a  more  accurate  algorithm  for  identifying  the 
feedback  paths  will  be  given. 


New  Algorithms  Suitable  for  Computer  Implementation 

In  the  computer  implementation  of  scheduling  algorithms  data 
structuring  is  important,  particularly  when  multi-processor  computer 
configurations  are  to  be  taken  into  consideration.  In  this  section, 
two  scheduling  algorithms  based  on  two  different  data  structures  are 
described.  The  first  is  based  on  a  stack  and  the  second  on  a  queue. 
For  a  single  processor  computer,  both  algorithms  will  have  the  same 
performance  since  only  one  task  can  be  carried  out  at  any  given  time. 
However,  for  a  multi-processor  computer,  the  second  algorithm,  whose 
data  structure  is  based  on  a  queue,  is  more  efficient  than  the  first 
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Fig.  3.3  A  Directed  Graph. 


algorithm. 

Ptf.ini.tiop  £.1  : 

A  stack  is  a  linear  list  for  which  all  insertions  and  deletions 
(and  usually  all  accesses)  are  made  at  one  end  of  the  list.  So  the 
stack  is  a  last-in-first-out  ("LIFO")  list  [24]. 

P-tfiBiUffB  1*1  : 

A  queue  is  a  linear  list  for  which  all  insertions  are  made  at 
one  end  of  the  list;  all  deletions  (and  usually  all  accesses)  are 
made  at  the  other  end.  So  the  queue  is  a  f irst-in-f irst-out  ("FIFO") 


list  [24]. 
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In  [25],  a  topological  sorting  algorithm  based  on  stack  is  pro¬ 
posed.  The  computing  time  is  0(|vi+iE|),  which  is  linear  in  the  size 
of  the  problem.  The  following  algorithm  performs  analysis  sequencing 
using  the  stack  data  structure. 


Aiggfi.tfew  1.1 

The  notation  s(i),  i-l,m,  is  the  sequence  of  analyzing  vertices 

vj,  j=l,m,  for  the  entire  directed  graph  G(V,E) . 

BEGIN 

k*l 

FOR  each  vertex  v.  in  G(V,E)  DO 
BEGIN 

nu{vi)«nf in(vj) 

IF  nn(v^)sO 

THEN  push  the  vertex  v  into  the  stack  S 

END 

REPEAT 

IF  the  stack  S  is  not  empty 
THEN 
BEGIN 

pop  out  v.  from  the  stack  S 
s(k)=v.  J 
k=k+l  J 

FOR  each  vertex  v.  in  fout(v4)  DO 
BEGIN  J 

nuW.j-nuCvj}-! 

IF  nu(v^)=o 

THEN  push  v.  into  the  stack  S 

END 

END 

ELSE 

BEGIN 

check  the  feedback  path 
(see  the  Algorithm  3.5) 

push  the  associated  vertex  into  the  stack  S 
END 

UNTIL  all  vertices  in  G(V,E)  are  scheduled  (k>m) 

END 


Example  3.3  : 


For  the  directed  graph  in  Fig.  3.2,  Algorithm  3.3  gives  the 


following  sequence  : 


6  5  12  4  11  3  2  10  16  9  15  18  14  19 

20  1  8  13  7  17  21  24  23  26  28  22  25  27 


Fig.  3.4  shows  the  flow  of  this  seqnence. 


Algorithm  3.4  is  similiar  to  Algorithm  3.3  except  that  Algorithm 
3.4  is  based  on  the  queue  instead  of  the  stack.  In  the  implementa¬ 
tion,  a  circular  queue  is  used  to  prevent  memory  overrun. 


Alaorithm  3 .4 

BEGIN 

k=l  1 

FOR  each  vertex  v.  in  G(V,E)  DO 
BEGIN 

na(Vj)=nf in(vj) 

IF  nulv^JeO 

THEN  push  the  vertex  v.  into  the  queue  Q 

END 

REPEAT 

IF  the  queue  Q  is  not  empty 
THEN 
BEGIN 

pop  out  v.  from  the  queue  Q 
s(k)=v  J 
k=k+l  J 

FOR  each  vertex  v.  in  fout(v=)  DO 
BEGIN  J 

nutv.J^nufv,)-! 

IF  nu(v.)=0 

THEN  push  v.  into  the  queue  Q 

END 

END 

ELSE  ; 

BEGIN  ^ 

check  the  feedback  path 
(see  the  Algorithm  3.5) 

push  the  associated  vertex  into  the  queue  Q 
END 

UNTIL  all  vertices  in  G(V,E)  are  scheduled  (k>m) 

END 


1 
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Fig.  3.4  The  Sequence  Given  by  Algorithm  3.3  for  the  Directed 
Graph  in  Fig.  3.2. 


34 


Algorithm  3.4  is  illustrated  by  Example  3.4. 

EtftgglS  i-4  : 

For  the  directed  graph  in  Fig.  3.2,  Algorithm  3.4  gives  the  fol¬ 
lowing  sequence  : 

1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16 

17  18  19  20  21  22  23  24  25  26  27  28 

The  flow  diagram  of  this  sequence  is  shown  in  Fig.  3.5. 

Remark  3 .1  : 

Comparing  the  sequences  shown  in  Fig.  3.4  and  3.5,  it  can  be 
seen  that  the  algorithm  based  on  the  stack  generates  the  sequence  by 
using  depth-first  search,  while  the  algorithm  based  on  the  queue  con¬ 
structs  the  sequence  by  selecting  all  the  vertices  at  one  level  and 
advancing  level  by  level. 


2.. A.  Discussion  on  Checking  Feedback  Path 

For  the  acyclic  directed  graph,  the  vertices  can  be  arranged  in 
a  topological  order  by  any  analysis  sequencing  procedure.  But  it  is 
possible  that  there  exist  feedback  loops  in  many  networks.  As  stated 
in  Section  3.3,  the  feedback  loops  can  be  avoided  by  contracting  the 
strongly  connected  component  into  a  r  »'  vertex  and  the  new  con¬ 
structed  directed  graph  is  then  acyclic  [9],  In  some  cases,  this 
approach  may  not  be  efficient  since  the  new  generated  subcircuit 
could  be  very  large.  For  solving  large-scale  networks,  the  Gauss- 
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Seidel  technique  is  widely  used;  this  technique  is  equivalent  to 
breaking  the  feedback  paths  [8,  15}.  Thus  it  is  necessary  to  check 
and  identify  the  feedback  paths  in  the  graph.  Furthermore,  as  the 
predictor  method  [see  Chapter  5]  will  be  adopted  to  predict  the  vol¬ 
tage  on  the  feedback  loop,  it  is  necessary  to  store  information  con¬ 
cerning  the  feedback  checking. 

In  Algorithm  3.3  (Algorithm  3.4),  the  stack  (the  queue)  stores 
the  'unscheduled*  vertices  whose  fanin  vertices  have  been  scheduled. 
Thus  the  vertices  in  stack  (queue)  are  ready  tasks  for  sequencing 
[26] .  If  there  are  feedback  loops  in  the  network,  an  empty  stack  or 
empty  queue  would  result  before  all  the  vertices  in  the  analysis 
sequencing  procedures  have  been  scheduled.  It  is  assumed  that  all 
feedback  paths  are  single;  i.e.,  that  only  one  feedback  path  enters  a 
vertex,  and  that  no  feedback  enters  the  input  vertices  (in  general, 
input  vertices  correspond  to  independent  sources).  It  is  straight¬ 
forward  to  obtain  the  following  lemma. 

J.2  : 

In  a  directed  graph  G(V,E),  a  vertex  v  with  single  feedback 
path  entering  it  is  identified  if 

(1)  nu(v)  -  1  at  the  point  of  sequencing,  and 

(2)  nfin(v)  >  1. 

Fig.  3.6  shows  a  directed  graph  with  a  single  feedback  path  and 
its  adjacency  matrix  X.  After  the  input  vertices  v^  and  v2  are 
scheduled,  the  associated  rows  and  columns  of  v^  and  v2 


are  removed 
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k 


from  X.  The  modified  matrix  becomes 


It  is  clear  that  the  snm  of  each  column  equals  to  the  number  of 
fanins  at  this  point;  that  is  tiu(v3)«nn(v4)*nu(v5)*nu(v6)=l .  By 
applying  the  above  Lemma ,  we  have  two  candidates,  v^  and  v^.  Both 
have  the  same  number  of  original  fanins  as  two.  Which  one  should  be 
scheduled  next?  Obviously  v^  i*  the  answer.  In  the  following,  a  new 
algorithm  for  checking  feedback  path  is  listed  : 

Alg<?rithw  1-i 

In  Algorithm  3.3,  a  depth-first  search  is  carried  out  for  find¬ 
ing  the  single  feedback  path  (v*,v)  from  v'  to  v.  The  stack  or  queue 
is  empty  at  the  outset. 

The  notation  labCv^  i*  defined  as 
lab(Vj)»i  if  Ti  is  visited 

*0  elsewhere 

BEGIN 

search  the  set  of  vertices  V  ,  where  for  each  vertex 
v  in  Vo  nu(v)“l  and  nfin(v)>l. 

L  choose  a  vertex  v.  in  V_  with  lab(vi)-0 
BEGIN 

FOR  each  vertex  v  in  fout(v4)  DO 
BEGIN  1 

IF  lab(v.)*o 
THEN  J 


BEGIN 

lab(v.)«l 

pash  v.  into  the  qaeue  Q" 

END  J 

END 

L"  pop  ont  the  vertex  v .  fTOm  the  qaeae  Q" 

FOR  each  vertex  v.  il  f0ut(v{)  DO 
BEGIN  1 

rakTio  TO  L'  {feedback  path  is  found) 
ELSE 
BEGIN 

IF  lab(v.}>o 
THEN 
BEGIN 

lab(vk)«i 

push  v.  into  the  queue  Q" 
END 

END 

END 

IF  the  queue  Q"  is  NOT  empty 
THEN  GO  TO  L" 

ELSE  GO  TO  L 
END  (L) 

L’  BEGIN 


(v*,^)  is  the  feedback  path 
END 
END 


Example  3.5  illustrates  the  procedures  given  in  this  algorithm. 


!•£  : 

Consider  the  directed  graph  in  Fig.  3.6,  where  a  feedback  path 
exists  from  vertex  7  to  vertex  3.  After  vertex  1  and  2  are  scheduled 
by  using  Algorithm  3.3  or  Algorithm  3.4,  vertex  3  and  4  are  selected 
as  candidates  for  the  possibility  of  having  feedback  paths  entering 
them.  Following  Algorithm  3.5,  two  possibilities  exist  : 


(1)  If  vertex  4  is  chosen  first,  then  vertex  4  and  5  are  marked 
and  the  search  ends  at  vertex  5.  Therefore,  no  feedback  path  enters 
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vertex  4.  Vertex  3  is  selected  next;  and  the  search  proceeds  through 
vertex  6  and  7  and  terminates  back  at  3.  Therefore,  a  feedback  path 
exists  from  7  to  3.  Further  search  will  bypass  4  since  it  has 
already  been  marked  'old'. 

(2)  If  vertex  3  is  chosen  first,  vertex  4,  6,  S  and  7  are 
visited  and  marked  sequentially  before  the  search  terminates  back  at 
vertex  3,  which  again  indicates  a  feedback  path  from  7  to  3. 

Remark  1.6  : 

Since  each  vertex  is  labeled  at  most  once  and  each  edge  is  exam¬ 
ined  at  most  once,  the  time  complexity  for  this  algorithm  is 
0<|V|+|E|) . 

1-2  : 

If  there  is  no  feedback  loop  in  the  network,  then  Algorithm  3.S 
would  not  take  any  computation  time  in  checking  feedback. 

RsiflE  1.1  : 

For  arbitrary  networks,  this  algorithm  may  not  be  satisfactory 
in  identifying  minimal  feedback  loops  as  other  complex  algorithms  do 
[27].  However,  general  algorithms  are  not  cost  effective  because  the 
complexity  grows  exponentially  with  the  size  of  the  network  [27]. 
Algorithm  3.3,  on  the  other  hand,  is  cost  effective  for  digital  elec¬ 
tronic  networks,  where  the  feedback  loops  are  generally  regular  and 


simple . 
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1.6.  Analysis  Sequencing  for  Relevant  Parts 

For  aost  digital  circuits,  only  a  snail  portion  of  the  entire 
circuit  is  active  at  any  tine.  Latency  and  selective  trace  (or  event 
scheduler)  techniques  have  been  used  to  take  advantage  of  this  fact 
and  save  CPU  tine  and  aeaory  storage  [13,  14}.  Latency  exploitation 
mounts  to  identifying  the  inactive  parts  of  the  circuit  at  each 
tinepoint  in  the  solution  process  and  bypassing  them  at  that 
tinepoint.  In  contrast,  the  selective  trace  technique  depends  on 
finding  the  active  parts  and  analysing  then  in  the  proper  sequence. 
Beyond  these  two  techniques,  a  new  technique  which  could  save  addi¬ 
tional  computation  tine  and  nemory  is  described  next. 

In  aany  cases,  especially  in  digital  circuits,  the  output  of 
interest  aay  be  directly  or  indirectly  affected  by  only  a  subset  of 
the  subcircuits  in  the  systea.  These  subcircuits  will  be  referred  to 
as  the  'relevant'  parts  of  the  systea.  During  the  simulation,  it  is 
only  necessary  to  analyze  the  relevant  parts  even  if  the  remaining 
parts  of  the  systea  are  active.  For  exaaple.  Fig.  3.7  (a)  shows  the 
entire  circuit  to  be  analyzed,  which  has  been  partitioned  into  seven 
unilateral  subcircuits.  If  one  is  only  interested  in  the  output  of 
subcircuit  7,  then,  instead  of  all  seven  subcircuits,  only  four  sub¬ 
circuits  1,  2,  4  and  7  need  to  be  scheduled  and  analyzed  (Fig.  3.7 
(b)).  Siailiar  results  can  be  obtained  as  shown  in  Fig.  3.7  (c)  and 
(d)  if  the  output  of  subcircuit  6  only,  or  subcircuit  5  only,  are  of 
interest.  The  concept  of  analyzing  only  'relevant'  parts  is  siailiar 
to  the  concept  of  ’segeaentation'  in  logic  simulation  [27]. 
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This  approach  can  be  combined  with  either  a  latency  technique  or 
a  selective  trace  technique  to  further  increase  the  speed  of  simula- 
tion.  For  the  large-scale  circuit  simulation,  an  efficient  algorithai 
of  scheduling  the  relevant  parts  is  given  below.  First,  the  sequence 
of  analysis  is  constructed  for  the  G(V,E)  corresponding  to  the  entire 
circuit  by  using  Algorithm  3.3  or  Algorithm  3.4.  Second,  the  ver¬ 
tices  associated  with  the  relevant  parts  are  found  and  labeled  by 
tracing  backward  from  the  outputs  of  interest  to  the  inputs. 
Finally,  the  nonrelevant  vertices  are  deleted  from  the  sequence  of 
analysis.  Then  the  remaining  sequence  identifies  the  relevant  parts. 

Al&of.ithn  1.6 

This  algorithm  is  to  sequence  the  vertices  of  the  relevant  parts 

only. 

To  describe  the  algorithm,  we  use  the  following  notations  : 
lcr(i),  i-l,m',  :  the  sequence  of  analyzing  vertices 
of  the  relevant  parts  only. 

lbk(Vj)  *  i  if  v.  is  in  the  relevant  set 
*  0  elsewhere. 

The  algorithm  is  composed  of  three  parts  : 

Part  I  :  (analysis  sequencing  for  the  directed  graph  G(V,E) 
which  corresponds  to  the  entire  circuit) 

BEGIN 

k=l 

FOR  each  vertex  v.  in  G(V,E)  DO 
BEGIN  ' 

nu(vi)«nf in(Vj) 

IF  nufvjiao 
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THEN  path  the  vertex  v.  jnto  the  queue  Q 

END 

REPEAT 

IF  the  queue  Q  is  not  empty 
THEN 
BEGIN 


pop  out  v.  from  the  queue  Q 
* (k)*v .  J 
k«k+l  J 

FOR  each  vertex  v .  in  fout(v.)  DO 
BEGIN  J 

nulv.j-nutvj)-! 

IF  nu(v^)a:0 

THEN  push  v,  into  the  queue  Q 

END 

END 

ELSE 

BEGIN 

check  the  feedback  path 
(see  the  Algorithm  3.5) 

push  the  associated  vertex  into  the  queue  Q 
END 

UNTIL  all  vertices  in  G(V,E)  are  scheduled  (k>m) 

END 


Part  II  :  (identifying  the  relevant  vertices  by  tracing  them 
backward  from  the  vertices  vfl^,  i=l,m",  whose  voltage  values  are  of 
interest) 

Reverse  every  edge  in  G(V,E)  and  obtain  GR(V,E) .  Since  fin(v) 
and  lout ( v)  in  G(V,E)  are  fout(v)  and  fin(v)  in  GR(V,E) ,  let 
f inR(v)*fout (v)  and  foutR(v)=f in(v) . 


BEGIN 

FOR  each  vertex  v  .  DO 
BEGIN  01 

push  the  vertex  v  into  the  queue  Q' 

lbk(vol)-l 

END 

REPEAT 

BEGIN 

pop  out  the  vertex  v.  from  the  queue  Q' 
IF  nfin(v.)jto 
THEN  J 
BEGIN 

FOR  each  vertex  v.  i„  f0utR(v.)  DO 
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BEGIN 

IF  lbk(v{)-0 
THEN 
BEGIN 


END 

END 


pash  ▼.  into  the  qaeae  Q' 
lbk(v.)«l 
END 


END 

UNTIL  the  qaeae  Q'  is  empty 
END 


Part  III  ;  (deleting  nonrelevant  vertices  from  the  sequence 
obtained  in  Part  I  and  giving  the  sequence  of  the  relevant  set  only) 


BEGIN 

k=0 

FOR  i~l  TO  m  DO 
BEGIN 

IF  lbk(s(i))-l 
THEN 
BEGIN 
k=k+I 

lcr (k)=s( i) 
END 

END 

END 


1-2*  Extension  to  Maltinrocessor  Computer 

First,  we  assume  that  the  analysis  time  for  each  vertex  is  the 
same  and  that  the  directed  graph  is  acyclic.  If  there  is  no  limita¬ 
tion  on  the  number  of  processors  or  if  the  number  of  processors 
available  is  not  less  than  the  maximum  number  of  processors  required 
for  each  level  of  the  sequence,  then  the  minimum  computation  time  is 


obtained . 
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Lemma  £.3  : 

If  an  unlimited  number  of  processors  are  available  or  if  the 
number  of  processors  available  is  not  less  than  the  maximum  number  of 
processors  required,  then  the  minimum  completion  time  of  the  solution 
process  is  the  number  of  levels,  which  is  the  length  of  longest  path 
in  the  directed  graph. 

For  example,  for  the  directed  graph  in  Fig.  3.2,  the  minimum 
completion  time  is  9. 

When  the  number  of  processors  required  is  too  large,  many  of 
them  will  be  idle  most  of  the  time,  which  is  not  economical.  In 
practice,  there  is  always  a  limit  on  the  number  of  processors  avail¬ 
able  in  a  multi-processor  computer  system.  Therefore,  we  will  con¬ 
sider  next  the  case  when  a  limited  number  of  processors  are  avail¬ 
able. 

In  recent  years,  many  scheduling  strategies  have  been  proposed 
to  process  the  task  directed  graph  with  the  number  of  processors 
available  [28].  These  strategies  show  different  levels  of  complexity 
and  give  different  degrees  of  processot  utilization. 

Reverse  each  edge  in  G(V,E)  and  obtain  the  reversed  graph 

D 

G  (V,E) .  The  level  number  assigned  by  Algorithm  3.1  to  a  vertex  v  in 
GR(V,E)  is  levR(v) . 

Definition  £,6  : 


The  rank  r(v)  of  a  vertex  v  is  defined  to  be 
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r(v)  =  D  -  lev®(v) 

where  D  is  the  depth  of  the  directed  graph  G(V,E) . 


Example  3.£  : 

The  rank  sequence  for  the  directed  graph  in  Fig.  3.2  is 


rank 


0  12  3  4 


2  1  8  6  17 

3  11  7  20 

4  12  13 

3  14  18 

9  15  19 

10  16 


5 


21 


6 


22 

23 

24 


7 


25 

26 


8 


27 

28 


The  rank  number  is  the  latest  time  that  a  vertex  must  be 
scheduled  to  have  the  minimum  completion  time  for  the  directed  graph. 

A  ready  task  has  been  defined  to  be  the  vertex  v  for  which  the 
vertices  in  fin(v)  have  all  been  scheduled.  The  strategy  followed  is 
to  schedule  the  vertex  with  the  smallest  rank  number  among  all  the 
ready  tasks. 


Example  3.7  : 

For  the  directed  graph  shown  in  Fig.  3.2,  if  the  number  of  pro¬ 
cessors  available  is  p=3 ,  the  sequences  by  using  the  above  stategy 
are 

P=3 


sequence  1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

2 

4 

10 

12 

16 

13 

17 

21 

22 

25 

27 

1 

5 

8 

14 

6 

18 

20 

23 

26 

28 

3 

9 

11 

15 

7 

19 

24 
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If  the  strategy  is  to  schedule  the  one  vith  the  smallest  level 
number  instead  of  that  vith  the  smallest  rank  among  the  ready  tasks, 
then  the  total  completion  time  may  be  longer.  The  reason  for  this  is 
tkat  the  level  number  can  be  smaller  than  the  rank  number,  which  may 
delay  the  scheduling  of  key  vertices.  This  phenomenon  is  shown  in 
the  following  example. 

Example  2..I  : 


P-3 


sequence  1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

1 

4 

7 

10 

13 

16 

17 

20 

21 

22 

25 

27 

2 

5 

8 

11 

14 

18 

23 

26 

28 

3 

6 

9 

12 

15 

19 

24 

It  can  be  seen  that  the  completion  time  is  12  rather  than  11  as 
in  the  previous  example. 

The  time  to  finish  all  the  tasks  (the  schedule  length)  provides 
a  measure  of  processor  utilization.  For  the  strategy  of  scheduling 
based  on  the  smallest  rank  number,  the  ratio  of  the  schedule  length 
and  optimal  schedule  is  bounded  by  4/3  for  two  processors  and  2  - 
l/(p-l)  for  p  >  2  processors. 

There  are  several  algorithms  which  give  the  schedule  length 
closer  to  the  optimal,  for  example,  the  Coffman-Graham  algorithm 
[29].  The  ratio  of  its  schedule  length  and  the  optimal  is  bounded  by 
2  -  2/p  where  p  is  the  number  of  processors.  These  algorithms  have 
more  complex  strategies. 
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In  practice,  the  analysis  tine  for  each  vertex  (snbcirenit)  can¬ 
not  be  the  saae  and  the  nnaber  of  processors  available  nay  be  vari¬ 
able.  One  processor  shonld  be  assigned  for  the  seqnencing  task.  It 
is  assnaed  that  this  seqnencing  processor  is  efficient  enough  and  it 
will  not  delay  any  task  of  analysing  the  vertex  (snbcirenit).  Algo- 
ritha  3.7  below  gives  one  exaaple  of  snch  kind  of  algor itha. 


Maori  tha  1.2 

BEGIN 

k-1 

FOR  each  vertex  v  jn  G(V,E)  DO 
BEGIN 

nn(v1)»nfin(vi) 

IF  nu(v.)*o 

THEN  push  the  vertex  v.  into  the  queue  Q 
END  1 

REPEAT 

IF  the  queue  Q  is  not  eapty 
THEN 
BEGIN 

the  nuaber  of  processors  available  is  p 
the  nuaber  of  ready  tasks  in  the  queue  Q  is  q 
IF  q  >  p 
THEN  n*=p 
ELSE  n*q 
k*0 

REPEAT  (analyse  these  n  vertices  on  n  processors] 
pop  out  v.  from  the  queue  Q 
k=k+l  J 

FOR  each  vertex  v.  in  fout(v,)  DO 
BEGIN  3 

nu(v.)=nu(vi)-l 
IF  nutv^zO 

THEN  push  v.  into  the  queue  Q 

END 

UNTIL  k-n 
END 

UNTIL  all  vertices  in  G(V,E)  are  scheduled  (k>m) 

END 
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l.£.  Hi  motion 

In  this  chapter,  different  applications  of  analysis  sequencing 
were  discnssed.  As  the  task  of  sequencing  needs  to  be  done  only  once 
before  analyzing  the  circuits,  it  will  require  a  snail  portion  of  the 
total  computation  tine.  It  is  worthwhile  to  iapleaent  the  sophisti¬ 
cated  algorithms  which  could  save  coapntation  tine  and  increase  the 
accuracy  on  the  siaulated  results.  In  PREMOS,  Algorithas  3.4,  3.5 
and  3.6  have  been  inplemented  and  the  results  are  very  satisfactory. 

The  scheduling  algorithms  for  aulti-processor  computers  depend 
on  the  characteristics  and  the  structure  of  the  computer  and  on  the 
type  of  siaulator  being  implemented.  It  has  been  shown  in  [23]  that 
the  speed  of  logic  simulation  could  be  increased  more  than  several 
hundred  times  by  using  logic  processors  and  array  processors.  For 
large-scale  circuit  simulation,  the  study  of  sequence  scheduling  on  a 
multi-processor  computer  is  a  promising  area  of  research. 
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CHAPTER  4 

Nonlinear  Analysis  Methods 


1.1.  Introduction 

For  the  circuits  containing  nonlinear  elements,  both  DC  analysis 
and  transient  analysis  require  solving  sets  of  nonlinear  algebraic 
equations  of  the  form 

g  (xn)  -  0  (4.1) 

as  described  in  Chapter  2.  Eq.  (4.1)  is  usually  solved  by  using  a 
modified  Newton-Raphson's  method.  In  large-scale  circuit  analysis, 
the  entire  circuit  is  partitioned  into  smaller  subcircuits  and  non¬ 
linear  analysis  is  performed  at  the  subcircuit  level,  which  could 
provide  savings  in  CPU  time.  In  Section  4.2,  different  ways  of 
decomposing  an  electronic  circuit  into  subcircuits  are  discussed.  DC 
analysis  is  necessary  to  provide  the  operating  points  at  initial 
timepoint  and  is  also  used  at  every  timepoint  during  the  numerical 
integration  procedure.  Initial  DC  analysis  in  large-scale  circuit 
simulation  is  described  in  Section  4.3.  In  Section  4.4  it  is  shown 
that  solving  the  partitioned  nonlinear  subsystems  sequentially  takes 
much  less  effort  than  solving  the  entire  system  without  partitioning. 
A  new  modified  Newton's  method  for  the  DC  analysis  of  MOS  transistor 
circuits  using  a  2-element  companion  model  for  the  MOS  transistor 
(see  Appendix  1)  will  be  described  in  Section  4.5.  The  convergence 
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properties  of  the  new  technique  are  investigated  in  Section  4.6.  The 
discussion  and  conclusion  are  given  in  Section  4.7. 


1.2*  Decomposition 

In  macromodeling  approaches  snch  as  in  MOTIS  [6]  and  MOTIS-C 
[71,  each  subcircuit  corresponds  to  a  logic  clement  such  as  a  NANO, 
NOR  or  transfer  gate.  Generally  this  approach  does  not  give  good 
accuracy  because  interactions  among  logic  elements  by  transfer  gates 
and  series  drive  transistor  effects  are  not  modeled  sufficiently. 
Recently,  two  approaches  have  been  proposed  to  decompose  circuits 
into  unilateral  subcircuits.  The  first  is  to  decompose  the  circuit 
into  subcircuits  based  on  a  clustering  algorithm  applied  to  the  cir¬ 
cuit  model  in  steady  state;  i.e.  with  all  capacitors  open-circuited. 
This  approach  works  well  for  HOS  circuits  since  in  steady  state  the 
gate  is  not  affected  by  the  source  or  drain  voltage.  This  approach 
has  been  used  in  MOS  timing  [8]  and  logic  simulation  [3].  For  gen¬ 
eral  circuits  containing  bipolar  transistors,  further  approximations 
are  needed  to  obtain  this  type  of  'one-way'  circuit  decomposition 
[9].  Note  that  in  this  decomposition  approach,  subcircuits  composed 
solely  of  transfer  gates  are  avoided.  The  second  approach  of  circuit 
decomposition  is  modular  partitioning,  where  the  circuit  is  composed 
of  identifiable  modules  or  subcircuits  [30].  In  the  program  PREMOS, 


a  number  of  subcircuit  types  have  been  selected  as  primitive  modules, 
which  have  the  'one-way'  property. 


r 
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1.2.  Initial  DC  Analysis 

After  the  procedures  of  decomposition  and  analysis  sequencing 
are  completed,  the  DC  analysis  at  the  initial  time  point  is  done  to 
give  appropriate  initial  values  of  voltages  or  cnrrents  for  the  tran¬ 
sient  analysis.  At  this  initial  DC  analysis,  all  capacitors  are 
open-circnited  and  their  values  are  assumed  zero.  Some  node  voltages 
could  have  been  preset  as  the  initial  guess.  Instead  of  solving  the 
entire  circuit,  the  DC  analysis  is  processed  sequentially  at  the  sub¬ 
circuit  level  following  the  analysis  sequence.  Although  this 
approach  is  a  relaxation  one  compared  to  conventional  circuit  simula¬ 
tion.  it  could  generally  provide  relatively  accurate  DC  levels  with 
reduced  computational  efforts.  Some  modification  to  Newton's  method 
have  been  used  in  evaluating  the  DC  levels  at  each  subcircuit:  (1)  if 
the  evaluated  node  voltage  exceeds  2  Vcc,  where  Vcc  is  the  power  sup¬ 
ply.  then  it  is  only  recorgnized  as  Vcc;  (2)  if  the  computed  value  of 
node  voltage  is  less  than  -Vcc,  then  it  is  set  to  0.  The  objective 
is  to  prevent  any  large  change  of  node  voltage  solution. 

11-  Bimuiiaa  2fl  Ikt  Converaence  Rates 

In  this  section,  we  want  to  illustrate  that,  by  using  the 
Newton-Raphson  method,  solving  the  partitioned  nonlinear  subsystems 
sequentially  takes  less  effort  than  solving  the  entire  system.  Most 
of  the  theorems  and  definitions  mentioned  below  are  found  in  [16,  31, 
32]. 
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The  spectral  radius  p(A)  of  the  n  by  n  matrix  A  is  defined  as 

the  aiaximuai  of  the  moduli  of  the  eigenvalues  of  A;  i.e.,  if 

tXiJi-l,n  if  the  set  of  eigenvalues  of  A,  then 

p(A)  *  max  I  A,  {  (4.2) 

l<i<n 

We  consider  the  iterative  method  as  a  completely  consistent 
linear  stationary  method  of  first  degree,  which  may  be  expressed  in 
the  form 

tt(n+l)  *  q  u(n)  +  k  n  *  0,  1,  2,  ...  (4.3) 

where  G  is  the  real  n  by  n  iteration  matrix  and  k  is  an  associated 
known  vector. 

Theorem  £.1  [32]  : 

The  iterative  method  (4.3)  is  convergent  if  and  only  if  the 
spectral  radius  p(G)  is  less  than  one,  i.e.,  p(G)  <  1. 

We  define  the  rate  of  convergence,  R,  by 

B(G)  *  -  log  p(G)  (4.4) 

Theorem  £.2  [32]  : 

Let  the  block  triangular  matrix  T  be  partitioned.  Then 
n 

A(T)  -  0  A(T  .)  (4.5) 

i«l 

where  T^  i*  the  block  diagonal  matrix  and  X(T)  and  A(T^) 
represent  the  set  of  eigenvalues  of  T  and  T. .  respectively. 
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Let  the  iteration  matrix  G  be  partitioned  into  the  block  lower  tri¬ 
angular  fora.  Ve  could  obtain  the  following  theorem: 

Theorem  ±.l  : 

The  convergence  rate  of  the  iteration  aatrix  G  is  the  minimum 
of  that  of  the  block  diagonal  subaatrix  G.j,  j.e., 

8(G)  *  min  8<G..)  (4.6) 

i 

Proof  : 

It  follows  from  Theorem  4.1  and  Theorem  4.2. 

For  the  system  whose  iteration  matrix  can  be  represented  in  the 
lower  block  triangular  form,  the  number  of  iterations  required  for 
the  entire  matrix  to  converge  would  depend  upon  the  maximum  of  that 
among  all  of  its  block  diagonal  submatrices.  In  practice,  for  many 
of  the  subvectors  u^,  few  iterations  are  needed  to  reach  steady 
states.  When  the  subsystems  are  solved  independently  in  a  certain 
sequence,  the  number  of  nonlinear  iterations  required  to  solve  each 
subsystem  depends  upon  its  own  rate  of  convergence.  By  using  such  an 
approach,  the  computation  time  could  be  reduced  significantly.  For 
the  general  systems  whose  iteration  matrix  is  not  in  the  block  lower 
triangular  form,  the  block  Gauss-Seidel-Newton  algorithm  [16]  could 
be  used  effectively  provided  that  the  coupling  among  the  subsystems 
is  not  very  ’strong’.  Thus,  to  solve  large-scale  nonlinear  system, 
the  basic  approach  is,  first,  partition  the  entire  system  into  sub¬ 
systems,  and  then,  analyze  the  subsystems  in  the  proper  sequence. 
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Numerical  Properties  of  The  Mixed  Method 


In  conventional  circuit  analysis,  3-element  companion  models  of 
static  MOS  transistors  (Fig.  4.1)  are  generally  used  for  representing 
the  nonlinear  operations  of  the  device.  In  our  approach,  we  assume 
that  the  gate-to-source  voltage  is  given  and  thus  use  a  2-element 
companion  model  (Fig.  4.2)  to  evaluate  the  Jacobian  matrix  at  each 
iteration.  For  the  Gauss-Seidel  algorithm  with  analysis  sequencing, 
the  fan-in  gate  voltage  of  pass  transistor  is  known  but  the 
corresponding  source  (or  drain)  voltage  may  still  be  unsolved.  If  we 
use  2-element  MOS  transistor  models  by  setting  the  unknown  source 
voltage  initially  equal  to  its  value  at  previous  iteration  the  algo¬ 
rithm  will  no  longer  be  the  standard  modified  Gauss-Seidel  method. 
It  could  be  considered  as  a  modified  version  of  Newton-Raphson ’ s 
algorithm  at  the  subcircuit  level.  In  the  following,  the  results  of 
using  the  2-element  model  and  the  3-element  model  for  both  Newton's 
algorithm  and  the  standard  Gauss-Seidel  algorithm  are  compared.  The 
convergence  rate  of  the  DC  iteration  is  also  discussed. 

(i)  the  algorithm  with  the  3-element  companion  model 


For  the  test  circuit  shown  in  Fig.  4.3,  the  nodal  equations  of 
the  equivalent  circuit  using  a  3-element  model  for  the  MOS  device 
(Fig.  4.4)  are 


8l(vrvcc)"il+*2vl+8»2v3  +  i2+«3(vrv2)+8»3(v4'v2)  +  i3  =  ° 
83(v2"Vl)~i3_8«3(Vv2)  *  0 


(4.7) 

(4.8) 


The  above  two  equations  can  be  represented  in  matrix  form  as 
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’  W*3 

~8m3_83 

"Vl' 

"Vcc^rV^mZV*.^" 

_s3 

8m3+83 

V2 

i3+*m3V4 

(4.9) 


or 


"1  v  “  J1 
Solving  (4.9)  with  the  standard  Gauss-Seicel  method,  we  obtain 


‘  S1+82+83 

0 

'Vl" 

'8lVcc+iri2'i3"gm2V3"8m3V4+(8m3+83)v2' 

"«3 

8m3+83 

v2 

i3+8m3v4 

(4.10) 


or 


(ii)  the  algorithm  with  the  2-element  companion  model 

If  we  use  the  2-element  companion  model  for  the  MOS  device  and 
take  the  previous  value  of  the  source  voltage  v2  in  evaluating  the 
model  of  T3 ,  then  the  corresponding  nodal  equations  of  the  equivalent 
circuit  (Fig.  4.S)  become 


'  g1+*2n3 

'*3 

~83 

«3 

’  [V1  " 
v2 

=  I”  8lVcc+il-i2-i3  ‘ 
J3 

(4.11) 

M2 

V 

J2 

where 
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*2  s  h  +  «m2v3 

*3  "  *3  +  *m3v4  -  *m3v2 

and  vP  is  the  value  of  v2  at  previous  iteration. 


In  order  to  solve  the  above  equations  by  using  the  standard 
Seidel  method,  (4.11)  should  be  modified  to 


’  i1n2n3 

0 

‘V1  ' 

‘ ‘rcc*1!-1*"1^! " 

-g3 

*3 

v2 

*3 

Gauss- 


(4.12) 


where 

*1  *  *1 

*2  "  h  +  *m2v3 

*3  "  h  +  *m3v4  "  *m3v2 

and  v|  is  the  value  of  v2  at  previous  iteration. 


In  the  following,  four  methods  are  compared.  Method  1,  whose 
iteration  matrix  is  M^  in  (4.9),  represents  the  conventional  Newton- 
Sapbson  method  using  the  3-element  companion  model.  Method  2  with 
iteration  formula  (4.10)  is  the  standard  Gauss-Seidel  method  using 
the  3-element  model.  Method  3  represented  by  (4.11)  is  the  modified 
Newton's  method  with  the  2-element  model.  Method  4  with  iteration 
formula  (4.12)  is  the  standard  Gauss-Seidel  method  using  the  2- 


element  companion  model. 
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Table  4.1  Kate  of  Convergence . 


Initial  Guesses 
(Vj,  Vji  Vj,  v4) 

No.  of  DC  Iterations  Required 

(w/1)*  of  n,T2  T3 

Method  1 

Method  2 

Method  3 

Method  4 

(1)  1.0  1.0  5.0  5.0 

(4,  6.  4) 

11 

17 

11 

17 

(2)  1.0  1.0  5.0  5.0 

(4,  10.  1) 

19 

20 

19 

20 

(3)  1.0  1.0  5.0  5.0 

(4,  10,  2) 

13 

15 

13 

15 

(4)  1.0  1.0  5.0  5.0 

(4.  10.  4) 

10 

14 

10 

14 

(5)  4.5  3.5  1.0  5.0 

(4.  6.  4) 

77 

77 

72 

72 

(6)  4.5  3.5  1.0  5.0 

(4.  10.  1) 

141 

141 

137 

137 

(7)  4.5  3.5  1.0  5.0 

(4.  10.  2) 

104 

104 

100 

100 

(8)  4.5  3.5  1.0  5.0 

(4.  10.  4) 

77 

77 

72 

72 

*In  this  table,  (w/1)  means  the  ratio  of  channel  width  w  to 
channel  length  1  of  the  MOS  transistor. 

Table  4.1  shows  the  rate  of  convergence  of  these  four  iterative 
methods,  where  the  same  criterion  for  convergence  is  applied.  The 
number  of  iterations  required  for  convergence  depends  upon  the  ini¬ 
tial  guess.  The  initial  guesses  are  shown  on  the  left-hand  side  of 
Table  4.1.  To  compare  these  four  methods,  an  initial  guess  far  from 
the  solution  is  selected  and  a  very  strict  criterion  for  convergence 
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is  set.  This  gives  one  of  the  reasons  why  the  number  of  DC  itera¬ 
tions  is  large.  The  other  reason  is  that  the  body  effect  is  not 
taken  into  account  in  MOS  device  modeling,  which  could  result  in 
small  convergence  rate.  According  to  the  results  shown  in  Table  4.1, 
the  following  points  could  be  made  : 

(1)  Method  3  has  the  minimum  number  of  iterations  required  in  all 
cases . 

(2)  The  Gauss-Seidel  method  takes  more  or  the  same  number  of  itera¬ 
tions  compared  to  the  Newton's  method,  and  there  is  also  a  little 
discrepancy  between  the  solutions. 

(3)  It  was  found  that  the  smaller  the  w/1  ratio  of  T3,  the  more  the 
number  of  iterations  required  to  reach  the  solution.  This  can  be 
explained  from  the  matrix  structure:  the  smaller  the  g^(  the  smaller 
the  convergence  rate. 


The  Number  of  Iterations  Required  To  Achieve  Convergence 

In  this  section,  a  comparison  is  made  between  the  analysis 
results  obtained  by  using  different  methods  for  a  fixed  preassigned 
number  of  iterations.  During  transient  analysis,  the  initial  guess 
at  any  time  is  chosen  as  the  previous  value.  Since  the  timestep  is 
either  small  enough  or  controlled  by  the  local  truncation  error,  the 
initial  guess  is  assumed  to  be  close  to  the  correct  solution.  There¬ 
fore,  in  practice,  it  is  not  necessary  to  go  through  as  many  itera¬ 


tions  as  shown  in  Table  4.1  to  converge  to  the  solution.  Depending 


upon  the  operating  points,  different  numbers  of  iterations  are 
required  for  convergence.  In  large-scale  circuit  analysis,  the 
number  of  iterations  could  be  limited  to  save  the  computation  efforts 
in  evaluating  device  models  and  in  solving  the  matrix  equations  pro¬ 
vided  the  analysis  results  are  reasonably  accurate.  In  Table  4.2, 
the  solutions  after  3  iterations  using  the  four  different  methods  at 
different  operating  points  are  shown,  where  'final  solution'  means 
the  solution  by  using  Newton's  method  (Method  1). 

From  Table  4.2,  the  following  observations  can  be  made:  , 

(1)  Even  with  3  iterations,  the  solutions  for  these  four  methods 
are  good  enough  to  satisfy  the  accuracy  requirements  of  large-scale 
circuit  simulation. 

(2)  From  the  accuracy  point  of  view.  Method  1  and  Method  3  are 
better  than  Method  2  and  Method  4.  After  considering  the  efforts  in 
evaluating  the  device  models.  Method  3  seems  to  be  the  best  choice. 

In  PREMOS,  there  are  a  number  of  modular  subcircuits  which  are 
primitives  for  the  entire  circuit.  By  studying  the  structures  of  the 
subcircuit  equations  and  after  performing  a  large  number  of  DC 
analyse  of  the  individual  subcircuits,  a  fixed  number  of  iterations 
was  selected  for  each  type  of  subcircuit,  which  gave  reasonably  accu¬ 
rate  results  under  a  wide  range  of  initial  conditions.  By  assigning 
a  fixed  number  of  iterations  to  different  subcircuits,  we  could  elim¬ 
inate  checking  for  convergence  and  having  to  compute  unnecessary 
additional  iterations,  and  thus  reduce  the  computational  require- 


Table  4.2  Analysis  Results 


Initial  Guesses 
(vl'  v2*  v3 '  V 
w/1  of  T1 ,T2  T3 

(v  ,  v2) 
Final 
Solution 

The  Values  of  v,  and  v2 

After  3  Iterations 

Method  1 

Method  2 

Method  3 

Method  4 

(1) 

0.3  0.3  5.0  5.0 
(4.  10.  8) 

0.20624 

0.20624 

0.20655 

0.20681 

0.21352 

0.21467 

0.20656 

0.20681 

0.21352 

0.21468 

(2) 

0.3  0.3  5.0  5.0 
(4.  10.  4) 

0.20624 

0.20624 

0.20674 

0.20766 

0.20979 

0.21169 

0.20674 

0.20767 

0.20979 

0.21170 

(3) 

0.3  0.3  5.0  5.0 
(4.  6.  4) 

0.35008 

0.35007 

0.34926 

0.34844 

0.34545 

0.34396 

0.34926 

0.34845 

0.34545 

0.34397 

(4) 

4.9  3.9  1.0  5.0 
(4.  10,  8) 

4.9966 

3.9777 

4.9917 

3.9230 

4.9912 

3.9230 

4.9914 

3.9256 

4.9914 

3.9256 

(5) 

4.9  3.9  1.0  5.0 
(4.  10.  4) 

4.9966 

3.9687 

4.9929 

3.9134 

4.9927 

3.9134 

4.9928 

3.9143 

4.9928 

3.9143 

(6) 

4.9  3.9  1.0  5.0 
(4.  6.  4) 

4.9979 

3.9687 

4.9941 

3.9134 

4.9939 

3.9134 

4.9940 

3.9143 

4.9940 

3.9143 

±.l.  Discussion  and  Conclusion 

The  efforts  required  to  evaluate  the  3-element  device  model  cost 
at  least  twice  as  much  as  that  required  to  evaluate  the  2-element 


device  model.  It  has  been  shown  that  the  new  iteration  method  with 
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the  2-element  device  model  gives  the  same  or  a  better  convergence 
rate  compared  to  that  of  the  conventional  3-element  model.  Further¬ 
more,  comparison  of  the  accuracy  after  3  iterations  indicates  that 
the  analysis  results  with  this  new  iterative  method  is  acceptable  and 
is  still  comparable  to  the  conventional  Newton’s  method  using  the  3- 
element  model.  Of  course,  since  convergence  is  not  being  checked,  it 
is  conceivable  that  under  certain  conditions,  the  results  could  be 
inaccurate.  More  theorectical  study  on  this  nonlinear  iteration 
method  needs  to  be  carried  out. 

To  set  a  fixed  number  of  iterations  for  different  types  of  sub¬ 
circuits  seems  to  be  an  empirical  approach.  Such  concepts  have  been 
implicitly  used  io  several  large-scale  circuit  simulators,  such  as 
using  only  one  iteration  in  timing  simulators  [6,  7].  It  is  not  easy 
to  decide  the  optimal  number  of  iterations  required.  Even  for  cer¬ 
tain  types  of  subcircuits,  there  are  many  factors  involved,  such  as 
the  device  sizes,  operating  points,  etc.  To  evaluate  the  number  of 
iterations  required  for  each  subcircuit,  we  could  have  a  customized 
formula  which  gives  different  weights  to  the  key  factors  involved. 
For  simple  single-node  subcircuits,  like  inverters  or  NOR  gates,  one 
iteration  has  been  found  to  give  fairly  good  results.  In  general, 
for  MOS  circuits  consisting  of  two  to  four  nodes,  three  iterations 
seem  to  be  sufficient  to  solve  the  subcircuit. 
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CHAPTER  5 

The  Modified  Gauss-Seidel  Method 

i.l.  Introduction 

In  the  standard  Gauss-Seidel  method,  any  feedback  loop  is  decou¬ 
pled  by  assuming  that  there  is  no  change  in  the  feedback  loop  over 
the  integration  timestep.  Since  previous  values  are  used  for  the 
'unsolved'  variable,  errors  are  introduced.  Traditionally,  these 
errors  could  be  reduced  by  some  relaxation  techniques  such  as  the 
Newton-Gauss-Seidel  method  [16].  However,  in  large-scale  circuit 
analysis,  a  one-sweep  approach  is  desired  to  minimize  the  computation 
time.  To  increase  the  accuracy  of  the  analysis  with  only  one  sweep, 
it  has  been  found  in  this  research  work  that  explicit  formulas  could 
be  used  to  predict  the  'unsolved'  variables  when  feedback  loops  exist 
in  the  system.  Section  5.2  introduces  a  modified  Gauss-Seidel  method 
using  prediction.  The  numerical  properties  of  the  method  are  dis¬ 
cussed  in  Section  5.3.  In  Section  5.4,  a  numerical  method  is  used  to 
estimate  the  order  of  convergence  for  the  proposed  modified  Gauss- 
Seidel  method.  In  Section  5.5,  a  test  for  checking  the  presence  of 
parasitic  oscillatory  components  in  the  solution  is  discussed.  The 
chapter  concludes  with  the  conclusion  and  discussion  in  Section  5.6. 
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i-2.  Modified  Gauss-Seidel  Method 

Using  the  Gauss-Seidel  method  in  analyzing  a  circnit.  the  'feed¬ 
forward'  interdependence  among  the  subcircuits  is  accounted  for  by 
the  analysis  sequencing  procedure.  The  'feedback'  interdependence, 
on  the  other  hand,  which  is  usually  caused  by  feedback  loops  or 
floating  capacitors  or  any  other  bilateral  element  connecting  two 
subcircuits,  is  taken  care  of  by  using  previous  values.  In  this 
chapter,  we  introduce  a  new  technique,  the  modified  Gauss-Seidel 
method.  This  technique  uses  a  forward  predictor  to  evaluate  the  node 
voltages  on  the  feedback  loops  (or  the  other  node  of  a  floating  capa¬ 
citor),  rather  than  the  previously  computed  values,  when  solving  the 
associated  subcircuits.  Firstly,  we  consider  the  method  using  a 
first-order  predictor  with  the  backward  Euler  integration  formula. 

(i)  feedback  loops 

Consider  the  configuration  shown  in  Fig.  5.1,  the  nodal  matrix 
for  conventional  circuit  simulation  will  be 


where  g^  j*  the  transconductance  at  the  operating  point  in  the  MOS 


device  mode  1 . 
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If  one  aces  the  standard  Gauss-Seidel  method,  the  g_  term  in  the 

n 

upper  triangular  part  caused  by  the  feedback  is  moved  to  the  right- 
hand  side  as  shown  in  (5.2),  where  the  term  g  uses  the  value  of  vu 

D  N  N 

computed  at  a  previous  iteration  step. 


‘*11 

'Vl‘ 

‘ji*J'r*.vN 

*21  *22 

v2 

h 

*32  *33 

*  » 

V3 

IK 

h 

•  e 

L  *NN-1  *NN J 

-VN- 

-  jN 

Note  that  the  voltage  v^  also  one  of  the  controlling  voltages 
which  determines  y^  during  each  iteration  when  solving  the  whole 
matrix.  Thus,  both  y^  and  are  affected  by  the  value  of  vN,  in 
the  proposed  approach,  the  value  of  vN  is  first  predicted  at  the 
present  time  point  by  using  previous  points  and  according 
to  the  following  formula 


v(n+l)  _  v(n) 

N  guess  N 

where  hn  and  hn_j  represent 

respectively. 


.(n) 


+  h  ( 

n  h 


the  present  and 


(n-l) 

-  ) 

n-l 

the  previous 


time 


(5.3) 
steps , 


A  test  program  in  which  the  above  technique  is  implemented  has 
been  used  to  simulate  a  3-stage  ring  oscillator  (Fig.  5.2),  which  is 
a  critical  example  of  simulation  with  a  feedback  path.  The  timestep 
is  variable  and  controlled  by  the  local  truncation  error  at  each 
timepoint  during  the  analysis.  The  results  obtained  are  shown  in 
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Fig.  5.3,  where  it  can  be  seen  that  the  new  approach  produces  store 
accurate  results  than  the  standard  Gauss-Seidel  method  in  about  the 
same  amount  of  computation  time. 

(ii)  floating  capacitors 

The  proposed  method  can  also  be  used  to  take  into  account  the 
feedback  effects  of  floating  capacitors.  For  the  circuit  shown  in 
Fig  5.4,  the  nodal  equation  in  matrix  form  is 


"  yn  -°/h" 

’V1  " 

'h~ 

y21  y22 

V2 

= 

j2 

-“c/h  y32  y33  - 

< 

u> 

-  h  - 

Using  the  proposed  method,  the  value  of  the  v3  is  predicted  by 
V3  guess  in  *olvin8  the  equation 


Then 


y,,v,  -  (c/h) v,  =  j, 

11  1  3  guess  1 


v.  -  (y..)-1*!.  ♦  (c/h)v  ) 

1  11  1  3  guess 


where 


(n)  _  (n-1) 

V(n+D  .  <n)  +  h  {2i -  - 

3  guess  3  n  h 

n-1 


(5.5) 


(5.6) 


(5.7) 


As  shown  in  Fig.  5.*  (a)  and  (b),  this  method  produces  more 
accurate  results  than  the  standard  Gauss-Seidel  method  when  compared 
with  the  circuit  simulation  obtained  by  solving  the  entire  matrix 
without  partitioning.  In  all  cases,  the  backward  Euler  formula  is 
used  for  numerical  integration  and  a  local  truncation  error  timestep 


Voltage  (volts) 


FP-T534 


A  :  Solving  The  Entire  Matrix  Without  Partitioning 
B  :  The  Proposed  Modified  Gauss-Seidel  Method 
C  :  The  Standard  Gauss-Seidel  Method 
Fig.  5.3  Response  of  circuit  of  Fig.  5.2. 
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Fig .  5 


A  :  Solving  The  Entire  Matrix  Without  Partitioning 

B  :  The  Proposed  Modified  Gauss-Seidsl  Method 

C  :  The  Standard  Gauss-Seidel  Method 

.5  (a)  Voltage  Responses  V(l)  for  Fig.  5.4  with  the 
Backward  Euler  Integration  Formula. 
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Fig.  5 


A  :  Solving  The  Entire  Matrix  Without  Partitioning 

B  :  The  Proposed  Modified  Gauss-Seidel  Method 

C  :  The  Standard  Gauss-Seidel  Method 

.5  (b)  Voltage  Responses  V(3)  for  Fig.  5.4  with  the 
Backward  Euler  Integration  Formula. 
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control  scheme  with  the  absolute  tolerance  1.0e-3  is  employed  daring 
the  integration  process.  With  the  same  criterion  nsed  for  checking 
the  convergence  of  the  nonlinear  iterations  and  the  same  tolerance  on 
the  local  truncation  error,  the  number  of  total  timepoints  in  this 
example  is  427  for  the  circuit  simulation,  398  for  the  standard 
Gauss-Seidel  method  and  419  for  the  modified  Gauss-Seidel  method. 
Since  these  three  methods  require  about  the  same  number  of  solution 
timepoints  for  transient  analysis,  they  seem  to  have  the  same  order 
of  accuracy. 

The  modified  Gauss-Seidel  method  has  also  been  applied  using  the 
trapezoidal  integration  formula.  In  this  case,  the  corresponding 
second-order  predictor  to  be  used  for  the  'unsolved'  variables  Vj  on 
the  feedback  loops  (or  the  other  node  of  a  floating  capacitor)  is  of 
the  form 

y(n)  _y(n-l)  ^(n-l)  _y(n-2) 

v(n+l)  =  v(n)  +  h  [(3/2)  (-Li - III - >-(!/2)  < — » - L-* - )] 

i  guess  in  h  ,  h 

°  n-1  n-2 

(5.8) 


For  the  bootstrap  capacitor  circuit  given  in  Fig.  5.4,  the  simulated 
results  shown  in  Fig.  5.6  indicate  that  this  second-order  method  is 
also  more  accurate  than  the  standard  Gauss-Seidel  method.  In  these 
simulations,  the  timestep  is  determined  by  a  local  truncation  error 
timestep  control  scheme  in  which  the  absolute  tolerance  is  e*  = 
1.0e-12  and  the  relative  tolerance  =  l  .Oe-4  [5].  The  total 
number  of  timepoints  in  this  example  is  140  for  the  circuit  simula¬ 


tion,  117  for  the  standard  Gauss-Seidel  method  and  127  for  the  modi- 
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Fig. 


A  :  Solving  The  Entire  Matrix  Without  Partitioning 

•  B  :  The  Proposed  Modified  Gauss-Seidel  Method 

C  :  The  Standard  Gauss-Seidel  Method 

5.6  (a)  Voltage  Responses  V(l)  for  Fig.  5.4  with  the 
Trapezoidal  Integration  Formula. 
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A  :  Solving  The  Entire  Matrix  Without  Partitioning 
B  :  The  Proposed  Modified  Gauss-Seidel  Method 
C  :  The  Standard  Gauss-Seidel  Method 


Fig.  5.6  (b)  Voltage  Responses  V(3)  for  Fig.  5.4  with  the 
Trapezoidal  Integration  Formula. 
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fied  Gauss-Seidel  method.  Generally,  with  the  same  accuracy  require¬ 
ment,  the  analysis  using  the  trapezoidal  integration  formula  takes 
fewer  timepoints  than  the  backward  Euler  formula. 


In  summary,  the  modified  Gauss-Seidel  method  is  used  to  solve 
the  partitioned  system  of  equations,  which  now  becomes  a  sequence  of 
subsystem  equations  of  the  form  : 


8. 


n+1 
1  ' 


xn+l  _ 

’Vl’k 


‘k+1’ 


)-0 


(5.9) 


where 


x*=  x“  +  hi1?  (5.10) 

is  i 

in  conjuction  with  the  backward  Euler  integration  formula,  and 

x*  -  X?  +  h C (3/2) x“  -  (1/2) x"-1]  (5.11) 

li  i  i 

with  the  trapezoidal  integration  formula.  For  a  nonlinear  system, 
eq.  (5.9)  is  then  solved  using  Newton's  method. 


£.2.  Numerical  Properties  of  The  Predictor  Method 

To  study  the  numerical  properties  of  an  integration  method,  a 
linear  time-invariant  zero-input  asymptotically  stable  system  of  dif¬ 
ferential  equations  is  chosen  as  the  test  problem,  which  is  usually 


of  the  following  form  : 
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i  =  Ax 
x(0)  =  x 

o 

where  A  €  Rnxn  and  the  set  of 
open  left  half  plane.  Since 
circuit,  the  test  problem  for 
be  of  the  form  : 


(5.12) 

eigenvalues  of  A,  <j( A),  is  in  the 
floating  capacitors  could  exist  in  the 
the  modified  Gauss-Seidel  method  should 


Cx  «  Ax 

*  *  C_1Ax  =  A'x  (5.13) 

where  C  represents  the  capacitance  matrix  and  is  considered  to  be 
nonsingular.  The  eigenvalues  of  C-1A,  o(C_1A),  are  assumed  to  be 
in  the  open  left  plane.  Let  C  =  L+D+D  where  L  is  strictly  lower  tri¬ 
angular,  D  is  diagonal  and  0  is  strictly  upper  triangular  matrix. 
Similiarly,  we  have  A  *  L'  +  D'  +  U'  where  L',  D'  and  U'  are  defined 
in  the  same  way  as  L,  D  and  U. 

Since  either  the  backward  Euler  or  the  trapezoidal  formula  could 

be  used  to  discretize  the  derivative  operator,  our  discussion  is 

» 

separated  into  two  parts: 

(i)  The  modified  Gauss-Seidel  algorithm  with  the  backward  Euler 
integration  formula  : 

The  backward  Euler  formula  is  given  as 

ik+l  -  (l/b)(xk+1  -  *  )  (5.14) 

where  h  =  tk+^  -  tk  and  k  subscript  refers  to  a  particular  time.  By 
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applying  the  predictor  formula 


k+1 


+  hi. 


(5.15) 


with  the  backward  Euler  integration  formula  to  the  test  system 
(5.13),  the  following  recursive  relations  are  obtained: 


[(D+L)  -  h(D'+L'))xk+1  »  Cxk  -  Uxk+1  +  hU’xk+1 
=  Cxk  -  Ut*k  +  hik]  +  hU'[xk  +  hik] 

=  Cxk  -  Dtxk  +  hC“1Axk)  +  bU’[xk  +  hC^Ax^ 

=  t(C-O)  +  h(C'-DC_1A)  +  h2U'C-1A]xk  (5.16) 

Vl  "  (5‘17) 


where  xk  and  xk  are  assumed  to  be  exact  and  the  companion  matrix 

MD(h)  «  [(C-D)  -  h(A-D')]~1[(C-0)  +  h(U'-»C-1A)  +  h2D'C_1A]  (5.18) 

0 

To  study  the  numerical  properties  of  the  integration  algorithm 
described  in  (5.17),  the  following  definition  is  used  [33]. 

Definition  £.1  : 

An  integration  algorithm  is  consistent  and  zero-stable  if  its 
companion  matrix  M(h)  can  be  expanded  in  power  series  as  a  function 
of  the  stepsize  h  as 

M(h)  -  I  +  hA*  +  0(h2)  (5.19) 


Theorem  £.1  : 


The  modified  Gauss-Seidel  algorithm  with  the  backward  Euler 
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integration  formula  is  consistent  and  zero-stable. 

Proof  : 

The  coaipanion  matrix  K,.(h)  in  (3.18)  can  be  farther  expanded  as 

M^h)  *  [I  -  h(C-D)_1(A-D')]"1£l  +  MC-Ur^D'-UC^A)  +  h2(C-D)-1D'C-:lA) 

-  II  +  h(C-D)-1  (A-U*)  +  0(h2)]  [I  +  MC-Or^U'-DC^A)  +  0(h2)] 

-  I  +  h(C-O)-1 (A  -  DC-1A)  +  0(h2) 

-  I  +  hC_1A  +  0(h2) 

«  I  +  hA*  +  0(h2)  (5.20) 

where  I  is  the  identity  matrix.  Following  Definition  5.1,  this  algo¬ 
rithm  is  consistent  and  zero-stable. 

Tb?pr»*  i-1  : 

If  a  linear  multistep  method  is  consistent  and  zero-stable,  then 
it  is  convergent. 

The  proof  of  Theorem  5.2  can  be  found  in  many  numerical  books  such  as 
[341  . 

The  modified  Gauss-Seidel  algorithm  with  the  backward  Euler 
integration  formula  is  convergent. 

Proof  : 

It  follows  from  Theorem  5.1  and  Theorem  5.2. 
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Let  x(t)  be  the  exact  solution  at  t.  The  local  truncation  error 
(LIE)  is  given  by: 

L“k*l  -  I  -  ,k+1  I  (S.2I) 

If  the  order  of  the  accuracy  of  the  local  truncation  error  is  p+1 , 
which  means  LTE^+1  *  0(hp+1)»  then  the  method  is  of  order  p. 

Thgotew  5-4  : 

The  modified  Gauss-Seidel  algorithm  with  the  backward  Euler 
integration  formula  is  a  first-order  algorithm. 

Proof  : 


The  Taylor  series  expansion  for  i$ 


Xk  ”  x(tk+l)  '  hi(tk+l)  *  0(h2) 

*  ~  hA’x(tk+1>  +  0(h2)  (5.22) 

Substituting  (5.22)  into  our  local  truncation  error  computation,  we 
get 


LTEk+l  “  *  x<tk+l)  ~  Xk+1  * 

-  i  *<»k+1)  -  II  +  hA*  +  0(h2)]xk  I 

-  I  (I  -  [I  +  hA'  +  0(h2)][I  -  hA’  +  0(h2)]}x(tk+1)  I 


0(h2) 


(5.23) 


Therefore,  it  is  a  first  order  algorithm. 


Although  we  have  considered  D  and  D'  to  be  diagonal  matrices, 
the  above  properties  can  be  extended  to  the  case  where  D  and  D'  are 


J 
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block  diagonal  matrices. 

If  we  now  consider  tbe  standard  Gaass-Seidel  algorithm  with  the 
backward  Enler  formula  and  apply  it  to  the  test  system  (5.13),  we  get 

[(D+L)  -  h(D'+L')]xk+1  =  Cxk  -  Dxk  +  hU'xk  (5.24) 

Vl  "  »GS<h,Il  <5'25) 

where 

MGS(h)  -  [(D+L)  -  h(D'+L')J-1[((MJ)  +  hU'] 

Using  the  binary  expansion,  MGS(h)  can  be  expressed  as  : 

M  (h)  =  I  +  h(C-U) -1A  +  0(b2)  (5.26) 

bo 

Thus,  we  can  conclude  that  the  standard  Gauss-Seidel  algorithm  is  not 
consistent  when  floating  capacitors  exist  in  the  circuit. 

For  Gauss-Jacobi  method,  the  same  arguments  can  be  made  by 
deriving  the  companion  matrix 

M  ( h )  *  I  +  hD-1A  +  0(h2)  (5.27) 

uJ 

which  also  indicates  that  tbe  Gauss-Jacobi  algorithm  is  not  con¬ 
sistent  when  floating  capacitors  exist  in  the  circuit. 

(ii)  The  modified  Gaass-Seidel  algorithm  with  trapezoidal  formula  : 
Tbe  trapezoidal  formula 

ik+1  *  (2/h) (xk+1  -  xk)  -  xk  (5.28) 


applied  to  the  test  system  (5.13)  yields 
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C*k+1  “  (h/2)Axk+1  =  Cxk  +  (h/2)xk  (5.29) 

The  predictor  for  the  trapezoidal  integration  formula  has  the  follow¬ 
ing  form: 

*k+1  E  *k  +  h[ (3/2) ik  -  (l/2)ik_1]  (5.30) 

(5.30)  is  an  explicit  second-order  Adams-Bashf orth  formula.  Applying 
this  predictor  method  to  (5.29),  we  obtain 

(D+L)xk+i  -  (h/2) (D'+L')xk+1  =  Cxk  -  Uxk+1  +  (h/2)xk  +  (h/2)0'xk+1 

=  Cxk  -  D[xk  +  (3h/2)xk  -  ( */2)i k  ] 

+  (h/2)ik  +  (h/2)D’[xk  +  (3h/2)xk  -  (h/2)iklJ 

(5.31) 


Assuming  ^  *k»  »k_j  «nd  *k_^  to  be  exact,  (5.31)  can  be  written  as 

follows  : 


(c~U)xk+i  -  (h/2) (A-C')xk+1  =  (C-0)xk  +  (h/2) (A+U'-3UC_1A)xk 

+  (3h2/4)U'C_1Axk  +  (h/2)0C-1Axkl  -  (h2/4)U'C-1Axk _x 

(5.32) 


and 


*k+1  *  t(C-D)-(h/2) (A-U ' ) ] -1 [ (C-U) +(h/2) (A+0'-3UC_1A)+(3h2/4)U'C1A]xk 
+  [(C-0)-(h/2) (A-0’)]"1[(h/2)DC"1A-(h2/4)U'C"1A]xk_1 
=  [I  -  (h/2) (C-0)-1 (A-0’ ) ]-1* [I  +  (h/2) (C-U)"1 (A+0'-3UC_1A)  + 
(3h2/4)(C-U)_1U'C-1A]xk  +  [I  -  (h/2) (C-U)-2 (A-U* ) ]-1* 

[(h/2) (C-U)-10C-1A  -  (h2/4) (C-0)~1U'C_1A]xk 


(5.33) 
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Theorem  £. 1  ' 

The  modified  Gauss-Seidel  algorithm  with  the  trapezoidal 
integration  formula  is  consistent  and  stable. 

Proof  : 


Assuming  and  Xjt_j  are  exact,  the  Taylor  series  expansion  for  x^_j 
is 


x  =  x  -  hi  +  (h2/2)x‘  +  0(h3) 

k-1  k  k  k 

=  [I  -  hA'  +  (h2/2)(A')2  +  0(h3)]xk  (5.34) 

Since  the  timestep  h  is  small,  we  could  also  have  the  following 
binary  expansion 

[I-(h/2)  (C-U)“1(A-D')]“1  =  I  +  [(h/2)(C-U)'‘1(A-U')J 

+  [ (h2/ 4) ( (C-U) -1 (A-D ' ) ) 2]  +  0(h3)  (5.35) 

Substituting  (5.34)  and  (5.35)  into  (5.33),  we  obtain 

xk+1  =  [I  +  hA'  +  0(h2)]xk 

=  MT(h)xk  (5.36) 


Following  Definition  5.1,  this  algorithm  is  consistent  and  stable. 


Theorem  5.. 6  : 

The  modified  Gauss-Seidel  algorithm  with  the  trapezoidal 
integration  formula  is  convergent. 


Proof  : 
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It  follows  from  Theorem  5.5  sad  Theorem  5.2. 

Iteam  £.2  : 

The  aodified  Ganss-Seidel  algorithm  with  trapezoidal  iotegration 
formula  is  of  second-order . 

Proof  : 


Assume  tad  are  exact  and  expand  them  into  Taylor  series 

expansion 


.(j)  \ 

*  (tk+i) 
(C_1A)jx(tk+1) 


where  i-1,  2.  Substituting  (5.35)  and  (5.37)  into  (5.33), 


(5.37) 
we  obtain 


the  local  truncation  error 


LTEk+l  =  1  x(tk+l}  “  xk+l  * 
-  0(h3) 


(S.38) 


Hence,  the  proposed  algorithm  is  of  second  order. 


£•1*  Ordyr  Test 

In  this  section,  we  use  a  numerical  test  to  verify  the  order  of 

convergence  of  the  modified  Gauss-Seidel  algorithm  with  the  backward 

Euler  integration  formula.  Assume  that  the  numerical  solution  x  4. 

n  ls 

related  to  the  exact  solution  x(tn)  through  the  relation 

x  -  x(t  )  +  h*  C(t  ) 
n  n  n 


(5.39) 
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where  p  is  the  order  of  convergence.  Since  the  dependency  of  C(tn) 
on  the  tiaestep  h  is  of  0(hP+1) ,  the  order  of  convergence  for  the 
given  systea  can  be  approziaated  by  the  following  foranla  [36] 


V 


log  ( 


(x  (h)  -  z  <-*-)) 
n  P 


)  /  log2 


(5.40) 


For  the  linear  circuit  in  Fig.  5.7#  the  order  of  convergence 
found  by  using  the  above  foranla  is  given  in  Table  5.1.  At  initial 
tiae  (or  tiae-0) ,  V(l)  *  5  V  and  V(2)  =  0  V.  Since  the  formula  (5.40) 
is  aore  valid  at  a  smaller  tiaestep,  it  can  be  verified  that  the 
valne  of  p  is  closer  to  one  as  the  tiaestep  h  gets  smaller  in  Table 
5.1,  For  the  bootstrap  capacitor  circuit  shown  in  Fig.  5.4,  the 
results  of  these  tests  are  listed  in  Table  5.2.  Because  there  are 
nonlinear  devices  in  this  circuit,  a  number  of  iterations  are 
required  at  every  timepoint.  The  input  waveform  at  V(l)  is  a  step 
function  falling  froa  5V  to  0V  in  the  time  interval  20ns  to  30ns. 
From  Table  5.2,  it  is  observed  that  the  order  of  convergence  of  the 
modified  Gauss-Seidel  aethod  is  about  one  at  24  ns  and  approaches  one 
at  28  ns  with  a  decreasing  tiaestep. 


£.£.  A&E iXMSX  Test 

Applying  the  nodal  analysis  to  the  test  circuit  of  Fig.  5.8,  we 


obtain 


tia«*40 


V(2) 


2.26 


1.84 


1.49 


Table  5.2  Order  of  Convergence  for  tbe  Circuit  in  Pig.  5.4 


Measured  Modified 

Point  Ganat-Seidel  Method 
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Cl*l  +  *1V1  +  ipV2  +  c3^i  “  *2)  +  *3^vi  “  v2^  “  0  (5.41) 

C2*2  +  *2V2  +  *«V1  +  C3^2  “  *1*  +  *3*y2  ~  Vl*  “  0  (5.42) 


These  node  equations  ean  be  represented  in  the  following  matrix  fora 


cl+c3 


-e. 


-c. 


c2+c3 


'1 

r2 


*l+«3  *p-*3 


«m-«3 


*2+*3 


-  0  (5.43) 


(5.43)  can  be  rewritten  in  normal  form 


(5.44) 


where 


A(1 ,l)»(-(c2+c3 ) (3J+33 )_c3 (»h“*3 ) ) /DE 
A» 1.2)»(-(c2+c3> (gp-g3)-c3(g2+g3))/DE 
A(2,l)»(-(c1+c3) (gB-g3)-c3(g1+g3))/DE 
A(2,2)«(-(c1+c3) (*2+>3^_c3 (*p“*3) )/°E 
and 

DE-c1c2+c3(c1+c2) 

The  eigenvalues  of  the  matrix  A,  Aj  ,nd  X2.  can  be  obtained 
by  solving  the  characteristic  equation  of  A.  The  absolute  ratio  of 
and  X2,  which  are  given  in  Table  5.3,  shows  the  stiffness  of 
the  test  system  (5.44)  associated  with  the  set  of  element  values.  By 
applying  the  backward  Euler  formaula 


i 
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■xa.  liarJL 


to  (3.43),  we  obteio 


r (cl+c3)/h  +  *1  +  *3 
L-c3/h  +  g„  -  *3 


"C3/h  +  *p  '  *3  l[Vl] 

(c2+e3)/h  +  ,2  +  *3  Jlv2  Jn 


(5.45) 


r-(c1+c3)/h 

V‘  1 

M 

Lc3/h 

-(c2+c3)/h  J 

L.J 

(i)  Standard  Gants-Seidel  Method 


(5.46) 


Using  the  standard  Ganss-Seidel  method  in  (5.46).  we  obtain 


(Cj+c^/h  +  g2  +  »3 
'c3/h  +  *B  “  *3 


~(cl+c3)/h 

c3/h 


0 

(Cj+Cji/h  +  g2  +  g 


~ (c 2+Cj ) / h  J  Lv2  J 


*  0 


n-1 


n 


+ 


(5.47) 


Assume  a  solution  of  the  form  n»A2zn  and  v2  n*A2zn*  where  Aj,  A2  + 
0.  Then  ./)  is  transformed  into 


■ 

J- 


((c^+c3>/h  +i1+S3)-(c1+c3)/h 
(-Cj/h  +*B-g3)+c3/h 


V*3 

z((c2+c3)/h  +»2+*3^“^c2+c3) 


(5.48) 


To  obtain  a  nonzero  solution  the  matrix  in  (5.48)  must  be  singular. 
Thus  the  corresponding  characteristic  polynomial  can  be  expressed  as 

Pz2  ♦  Qz  +  E  -  0  (5.49) 


where 
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P*( (cl+cj)/h  +IJ+IJ) ((c2+c3)/h  +*2+*3) 

Q—UCj+c^/h)  (<c2+c3)/h  +*2+i3)-((c2+c3)/h)  ((Cj+Cj)  ♦g1+g3) 

-l-c3/h  +*.-l3)<*p‘*3) 

8^((c1+c3)/h)((c2+c3)/h)-(e3/h)(*p-*3) 

The  sufficient  end  necessary  condition  for  the  nonexistence  of 

oscillatory  parasitic  components  in  the  coapnted  solution  is  that  the 

roots  of  (5.49)  are  real  and  positive.  Critical  tiaestep  h  .  is 

cm 

defined  as  the  aaxiana  tiaestep  at  and  below  which  all  roots  of  the 

associated  characteristic  polynoaial  are  real  and  pos  itive.  If  Q2- 

4PR  2.  0  .then  the  roots  are  real;  otherwise,  they  become  complex. 

Whether  the  roots  are  positive  will  depend  on  the  circuit  parameters; 

for  example,  when  Sp  -  g3  ■  0  in  the  given  test  circuit,  the  roots 

are  always  real  and  positive  and  thus  there  is  no  liait  on  h  ,  in 

crit 

general,  a  table  look-up  aethod  can  be  used  to  find  b  if  it 

cri  r 

exists.  For  each  set  of  eleaent  values  shown  in  Table  5.3,  the 
correspondent  is  infinite  in  the  cases  #1  to  #10  and  at  least 

larger  than  1.0e+4  for  #11  and  #12. 

(ii)  Modified  Gauss-Seidel  Method 

By  applying  the  modified  Gauss-Seidel  method  in  (5.46),  we 


obtain 
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[ 


(Ci+c3)/h  +IJ+I3  0 


'V*  +*«~*3 


(c2+c3)/h  +*2+«3 


][:;! 


[ 

[ 


-(cl+c3>/h 

c3/h 


~e3/h  +2»p": 

-<VC3>/h 


"•][:;] 


Astute  s  solution  of  the  form  v,  -  A«zn  and  v  -  *  -n 


Aj  i*  0.  Then 
T.  2 


zz((c  +c,)/h  +gt+g,)  *(-c,/h  +2g  -2g.) 

-zttcj+^J/h)  +c3/h  -gpP+g3 


*(-c3/h  +g  -ga) 

L  "  3 


*(<c-+C,)/h  +g2+g,>  1 1  A_  ) 

-(TcJc,)/h)2  JL  J 


'2  w3 


(5.50) 


2  n-2 

In*  Ajz"  and  v2  n  “  A2*"  *  »h®re  A^, 


0  (5.51) 


Again,  tlis  Matrix  mast  be  singnlar.  Thus,  the  following  charac¬ 
teristic  polynomial  can  be  obtained  : 

Pz3  +  Qz2  +  Rz  +  S  -  0  (5.52) 


where 


P-((c1+C  >/h  Tgj+gj) ((c2+c3)/h  +g2+*3 ) 

0^((c2+c3)/h)  ((c2+c3)/h  +gj+g3)  '  ((ci+c3)/h)  ((c2+c3)/h  +g2'f*3) 
-(-c3/h  +2gp-2g3)(-c3/h  +gB-f3> 

R-((c1+c3)/h)((c2+c3)/h)  -  (c3/h)(-c3/h  +2gp-2g3) 

-(-c3/h  -g.-gjHcj/h  -gp+g3) 

S—  (c3/h)(c3/h  -gp+gj) 

The  sufficient  and  necessary  condition  for  the  nonexistence  of 
oscillatory  parasitic  components  in  the  computed  solution  is  that  the 
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roots  of  (S.S2)  sre  real  and  positive.  Given  the  set  of  element 
values,  if  the  critical  tisiestep  l»crit  exists  then  at  hcrit  we  have 

q3  +  r2  *  0  (5.53) 


where 


3P 


9  P 


r  _  — 1—/  QB  -  3  S  \  _  — I_(_£L)3 
r  6  '  p2  J  P  1  27  1  P  1 

If  q3+r2  i  0  then  all  the  roots  are  real,  otherwise  a  pair  of  conju¬ 
gate  roots  exists  [37].  Whether  the  the  roots  are  positive  will 
depend  on  the  circuit  paraaieters.  Therefore,  a  table  look-up  method 
can  be  used  to  find  the  h^^  by  observing  the  transistion  of  the 
q2+r2  from  negative  to  positive  with  increasing  timestrp  b. 


In  Table  5.3,  the  l>cr}t  is  listed  at  different  element  values 
for  the  modified  Gauss-Seidel  method,  and  X2  are  two  eigen¬ 
values  of  the  test  system  (5.44).  The  following  conclusions  can  be 
made  from  this  table: 

(1)  The  effect  of  increasing  the  floating  capacitor  c^  j$  to 
decrease  the  ^crjt,  and  vice  versa. 

(2)  Increasing  the  conductance  g^  slightly  lowers  hcrJt. 

(3)  Decreasing  the  conductance  gj  slightly  increases  hcrjt. 

(4)  The  effects  of  g^  on  the  hcrjt  **  *ore  critical  than  the  other 
parameters  in  the  circuit.  The  larger  gB  i»,  the  smaller  hcrit  is. 

(5)  The  existence  of  the  transconductance  of  the  feedback  current. 
Ip,  has  a  positive  effect  of  increasing  hcrjt. 
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Table  5.3  h  . .  for  the  Modified  Gaass-Seidel  Method, 
eric 


# 

C1 

c2 

C3 

D 

*2  1 

;3 

*■ 

gp 

lhi/x2l 

hcrit 

0 

1 

1 

0.1 

0 

0.1 

0 

1 

0 

0 

9.2 

1 

1 

1 

1 

0.1 

0 

1 

0 

0 

0.91 

2 

1 

1 

0.01 

0 

0.1 

0 

1 

0 

0 

95 

3 

1 

1 

0.1 

1 

0.1 

0 

1 

0 

0.0818 

6.3 

4 

1 

1 

0.1 

0.1 

0.1 

0 

1 

0 

0.1568 

8.6 

5 

1 

1 

0.1 

0 

1 

0 

1 

0 

0 

6.4 

6 

1 

1 

0.1 

0 

0.01 

0 

1 

0 

0 

9.9 

7 

1 

1 

0.1 

0 

0 

0 

1 

0 

0 

10 

8 

1 

1 

0.1 

0 

0.1 

0 

10 

0 

0 

0.99 

9 

1 

1 

0.1 

0 

0.1 

0 

0.1 

0 

0 

64 

10 

1 

1 

0.1 

0 

0.1 

0 

0 

0 

0 

320 

11 

1 

1 

0.1 

0 

0.1 

0 

1 

1 

0.7542 

*• 

12 

1 

1 

0.1 

0 

0.1 

0 

1 

0.1 

0.535 3 

•  • 

**  means 

hcrit  •* 

least 

larger 

than  1 

.0e+4 

For  MOS  inverter  circuit,  whose  companion  circuit  model  is 
described  in  Fig.  5.9  but  with  g^  =  g^  *  0,  the  small  signal  gain  can 
be  expressed  as 

\  *  v2/vl  "  *m/*2 

The  typical  value  for  g^  i*  0.7e-3,  and  that  for  g2  is  from  0  to 
2.0e-3.  From  the  above  table,  we  could  figure  out  the  range  of  the 

b  rit  for  simulating  the  inverter  circuit. 
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5.. Di  scussion 

The  modified  Gauss-Seidel  method  discassed  in  this  chapter  is 
used  to  solve  the  circuit  equations  and  to  decouple  the  feedback 
terms  during  the  analysis.  The  technique  of  using  a  forward  predic¬ 
tor  to  estimate  the  values  of  the  yet  unsolved  variables  in  the  feed¬ 
back  loops  was  found  to  be  more  accurate  than  the  standard  Gauss- 
Seidel  method,  without  requiring  much  additional  computation.  Pro¬ 
vided  the  timestep  is  less  than  a  critical  maximum  timestep  h 

r  crit 

associated  with  the  set  of  element  values,  the  accuracy  test  proves 
that  no  oscillatory  parasitic  components  are  presented  in  the  com¬ 
puted  solution.  In  general,  for  a  wide  range  of  element  values,  the 
associated  hcrit  is  relatively  large.  So  this  algorithm  is  reason¬ 
ably  accurate. 


100 


CHAPTER  6 

Latency  and  Tine-Step  Control  Scheme 


4.1.  LtJtgasj  Scheme 

6.1.1.  Introdoction 

Daring  the  analysis  of  large-scale  partitioned  networks,  a  large 
portion  of  the  subnetworks  is  not  active  at  any  given  time.  This 
temporary  inactive  behavior  of  a  subnetwork  is  defined  as  'latency' 
[13,  38].  The  latent  states  of  a  subsystem  can  be  established  by 
monitoring  the  changes  of  all  its  stimuli  and  all  its  responses  to 
ensure  their  being  within  certain  predetermined  errors.  Once  the 
latent  status  of  a  subsystem  is  established,  the  analysis  of  that 
latent  subsystem  can  be  bypassed,  and  thus  provide  savings  in  CPU 
time . 

In  [14],  the  latency  at  the  subnetwork  level  was  exploited  and 
four  schemes  of  determining  the  latency  in  time  were  proposed.  For 
the  one-way  macromodelling  approach,  all  subcircuits  are  unilateral 
and  each  subcircuit  can  be  identified  as  an  'event'.  Therefore,  we 
can  take  maximum  advantage  of  the  latency  in  time  to  achieve  computa¬ 


tional  effiency. 


101 


Latency  Scheme  iox  Ifa.  Wjtwork  Composed  2l  Bttllllml  SttfeBllz 

Haifa 

For  the  subnetwork  N^(  let  the  fanin  node  voltages  be  denoted  by 

vik  *  P*l*2,...  ,  the  internal  node  voltages  of  be  denoted  by 

P 

vok  >  9*1 » 2 » . • .  . 

4 

Latency  Scheme  : 


A  subnetwork  js  considered  to  be  latent  at  time  tn  if 

U)  1  Vik  (tn)  ‘  vik  (tn-l>  1  <  81 
P  P 

(2)  1  vok  ^n*  '  vok  (tn-l>  1  <  “2 
4  4 

p  =  1,  2, . . .  4  =  1 .  2, . . . 


(6.1) 


The  subnetwork  Nk  will  remain  latent  at  time  tn+1  as  long  as 


1  Vik  ^n+l*  "  vik  (tn>  1  <  81 
P  P 

P  =  1*  2 , .  . . 


(6.2) 


6.1.2,.  Examples 

The  latency  scheme  described  in  Section  6.1.2  has  been  success¬ 
fully  implemented  into  the  program  PREMOS.  The  results  of  applying 
this  latency  scheme  to  the  transient  analysis  of  2-bit  adder, 
binary-to-octal  decoder  and  10-stage  inverter  chain  are  given  in 
Table  6.1.  The  data  in  Table  6.1  corresponds  to  the  error  tolerance 
6j*1.0e-2  and  e2=1.0e-3. 


Table  6.1  Sianlation  Data  for  Transient  Analysis. 


Circnits 

With 

Latency 

Without 

Latency 

Percentage 

Savings 

2-Bit  Fall  Adder 
(Fig.  6.1) 

18.050  sec 

25.867  sec 

30.2  % 

Binary-to-Octal 
Decoder  (Fig.  6.2} 

11.417  sec 

17.583  sec 

35  % 

10-Stage  Inverter 
Chain  (Fig.  6.3) 

3.850  sec 

6.417  sec 

40  % 
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4.1.1.  Discussion 

The  latency  described  above  is  at  the  subnetwork  level.  This 
latency  principle  could  be  applied  to  any  cluster  of  subnetworks  in  a 
large  network.  In  113],  a  multilevel  latent  path  algorithm  is 
presented  and  latency  is  exploited  with  modularities.  For  the  pro¬ 
grams  having  a  multilevel  macromodel  structure,  this  approach  could 
achieve  more  computational  savings. 

Analysis  sequencing  for  the  relevant  parts  has  been  discussed  in 
Chapter  3.  During  the  simulation,  only  the  relevant  parts  of  the 
circuit  need  to  be  analyzed  even  when  the  remaining  parts  are  active. 
In  PREMOS,  this  approach  combined  with  the  latency  technique  is 
employed  to  provide  more  savings  in  computation  and  in  memory. 


4.2.  Time-Sten  Control  Scheme 

fL'l'k'  Introduction 

In  circuit  simulation,  the  stepsize  is  in  general  determined  by 
the  local  truncation  error.  The  local  truncation  error  of  a  numeri¬ 
cal  integration  algorithm  for  solving  the  initial-value  problem 

(6.3) 

(6.4) 

(6.3)  evaluated  at 


i  *  f (x,t) ,  x(0)  *  x 


is  defined  as 


•t'W  *  U<W  '  Vi1 


where  x(tn+^)  is  the  exact  solution  x(t)  to  Eq. 


r 
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t_tn+l»  and  xn+1  is  the  corresponding  nnaerieal  eolation  obtained  at 
the  saae  tiae  t“tn+^  «tn  +  h,  provided  that  in  using  the  nnaerieal 
integration  algoritha  we  aasnae  that  x-x(t_)  i,  the  exact  solution 

n  n 

at  t_tn.  In  other  words,  the  local  truncation  error  is  the  error 
aade  in  one  tiaestep. 

The  tolerance  on  the  local  truncation  error  is  defined  as 

DT  »  hED  (6.5) 

n 

where  ED  is  the  absolute  value  of  the  error  allowed  per  unit  tiae. 
In  (6.5).  ED  is  an  absolute  tolerance.  In  practice,  however,  rela¬ 
tive  tolerances  are  aore  meaningful.  A  larger  ED  is  allocated  for 
the  fast  transient  part  and  a  smaller  ED  for  the  slower  transient 
part.  After  adding  a  relative  tolerance,  DT  in  (6.5)  becomes 


DT  -  h  (e  |i  |  +  «  )  (6.6) 

n  r  n+Ji  ft 

In  (6.6),  is  the  relative  tolerance  and  <a  is  the  absolute 
tolerance. 


For  the  backward  Euler  integration  foraula, 
error  is  evaluated  by 

h2 

*T(tn+l>  "  "  ~f~  *(5) 


the  local  truncation 


(6.7) 


where  tn  £  $  £  *0+1*  ^he  second  derivative  x($)  in  (6.7)  can  be 
approxiaated  by  using  the  divided  difference  foraula 


x({)  -  2  DD2 


(6.8) 


and 
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I  , ,  -  X  x  -  X  , 

n+1  n  _  n  n-1 

“* - ”"-1 -  ,6.9) 

n  n-1 

If  the  loeal  truncation  error  for  tiaepoint  t  +1  is  considered  satis¬ 
factory  (less  than  the  allowable  tolerance),  the  new  tiaestep  hn+1  to 
coapnte  the  tiaepoint  ta+2  is  increased.  This  feature  allows  the 
solution  to  be  found  within  the  specified  accuracy  in  fewer 
tiaesteps.  On  the  other  hand,  ,if  the  local  truncation  error  is  too 
large,  then  the  tiaepoint  tn+^  is  recoaputed  by  using  the  reduced 
step-size.  Therefore,  the  LTE  at  each  tiaepoint  is  aaintained  within 
the  specified  bounds. 

Relaxed  Y«tl9B  fii  Tiae-Steo  Control  Scheae 

For  the  tiaestep  control  scheae  described  in  the  last  section, 
if  the  old  tiaestep  hn  1 1  rejected,  the  solution  at  tn+j  need);  to  be 
evaluated  again  at  the  newly  reduced  tiaestep  hn.  Generally,  this 
re-evaluation  adds  to  the  overhead  needed  for  the  tiaestep  control 
scheae  and  increases  total  coaputation  tiae.  Also,  due  to  the  inac¬ 
curacy  of  the  divided  difference  approxiaation  to  *x({),  some 
undesirable  situations  can  occur  if  one  does  not  iapleaent  the 
tiaestep  control  properly  [5,  14]. 

In  large-scale  circuit  siaulation,  aost  of  the  current  progrsas 
either  use  a  fixed  tiaestep  or  a  variable  tiaestep  controlled  by  the 
circuit  activity  (the  aaxiaua  voltage  variation)  [18].  To  have  an 
accurate  and  efficient  tiaing  analysis,  it  aay  be  worthwhile  to  have 
the  tiaestep  controlled  by  the  local  truncation  error.  However,  soae 


i 


1 
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aodifications  need  to  be  done  in  order  to  reduce  the  overhead 
required  in  the  iapleaentation.  The  deaired  features  for  a  new 
tiaestep  control  scheae  are  described  in  the  following: 

(1)  The  next  tiaestep  ha+j  i*  controlled  by  the  local  truncation 
error  evaluated  at  the  present  tiaepoint  t  ^ .  This  tiaestep  is 
accepted  all  the  tiae,  even  when  it  is  judged  to  be  too  large. 

(2)  The  increased  tiaestep  is  double  the  present  one  and  the 
reduced  tiaestep  is  half.  As  shown  later,  this  scheae  ensures  that 
the  local  truncation  error  is  generally  within  a  reasonable  range  of 
the  specified  tolerance  even  though  there  is  no  rejection  of 
tiaesteps . 

The  tiaestep  control  algoritha  is  as  follows  : 


BEGIN 

BEGIN 

*n  *  node  voltage  at  present  tiaepoint  tR 

*  m  node  voltage  at  previous  tiaepoint  t„  , 

h  ,  -  t  -  t  .  n  1 

h n-1  m  ,n  _n-l 

n-2.  n-1  n-2 

*  ”  “-/h  .  where  h_  is  the  next  tiaestep 

facaax*0 .0 

END  (inilizatjon) 

FOB  each  node  voltage  x  DO 
BEGIN  " 


ed  “  epa  +  epr  •  lxn| 


factor  «  |(x  -  xn_1>|/ed 

facaax  “  aaxTfacaax,  factor) 

END  (finding  the  aaxiaua  of  LTE/UT) 


BEGIN 


eaax  “  facaax  •  h  . 
IF  (eaax  >  1.2)  “ 

TEEN  a  -  0.5 
ELSE 
BEGIN 

IF  (eaax  <  0.4) 
TEEN  a  -  2.0 
ELSE  a  -  1.0 


/  (h 


n-1 


+  h 


n-2 


) 


J 
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h  -  a  •  h  , 
n  n-1 


It  should  be  noted  that  in  the  above  algorithm  the  apper  limit 
of  the  local  troncation  error  for  reducing  the  timestep  is  1.2  hn  ED, 
and  the  lover  limit  for  increasing  the  timestep  is  0.4  bn  ED. 
Although  there  is  no  rejection  of  the  timestep  at  any  timepoint,  the 
test  equation  below  shows  that  the  local  truncation  error  stays 
within  the  desired  tolerance. 

Let  us  consider  the  test  equation 


i  -  Ax 

where  A  is  negative.  The  exact  solution  of  (6.10)  is 

x  *  x  e** 
o 


(6.10) 


(6.11) 


where  xo  is  the  solution  at  time  zero.  Applying  the  backward  Euler 


integration  formula  to  (6.11),  we  obtain 
x  . .  «  x_  [  — - -  ] 


Xn+1  “  xn  1  1  -  hi 

U 


(6.12) 


Consider  the  situation  hn_j  “  ^n-l  *  ^  *n<*  hn  *  a  h,  where  a  could  be 


or  0.5. 

From  Eqs . 

(6.12) 

,  we  obt< 

LTE 

DD2 

h2 

2a 

n+1 

n+1 

n 

LTE 

n 

DD2 

ft 

a+1 

(6.13) 


There  are  three  cases  to  consider,  depending  on  the  value  of  a  : 
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Table  6.2  Numerical  Solution  Obtained  by  Using  Fixed  Tiaestep 
with  X— 10,  xo*i  and  h«0.01. 


Time 

Exact  Sol. 

Bk  Euler 

Gib  Err. 

LTE 

0 

.0000e+00 

.1QOOe+01 

. 1000e+01 

.0000e+00 

.0000e+00 

1 

.1Q00e-Q1 

•9048e+00 

•9091e+00 

-.4253e-02 

-.4253e-02 

2 

.2000e-01 

.8lE7e+00 

,8264e+00 

-.77l6e-02 

-.38676-02 

3 

.3000e-01 

.7408e+00 

•7513e+00 

-.1050e-01 

-.35156-02 

h 

,4000e-01 

.6703e+00 

.6830e+00 

-.1269e-01 

-.3196e-02 

5 

.5000e-01 

.6065e+00 

.6209e+00 

-.I439e-01 

-.2905e-02 

6 

,6Q00e-Q1 

.5488e+00 

.56456+00 

-.1566e-01 

-.2641e-02 

7 

.7000e-01 

.4g66e+00 

•5132e+00 

-.I657e-01 

-,2401e-02 

8 

,8000e-01 

.4493e+00 

.4665e+00 

-.17l8e-01 

-.2183e-02 

9 

,9000e-01 

,4066e+00 

.424le+00 

-.17536-01 

-.1984e-02 

10 

. 1 000e+00 

•3679e+00 

•3855e+00 

-.1766e-01 

-,l804e-02 

11 

. 1 1G0e+00 

,3329e+00 

.3505e+CO 

-.17626-01 

-.l640e-02 

12 

. 1 200e+00 

•3012e+00 

•3l86e+00 

-.1744e-01 

-.l491e-02 

13 

. 1 300e+QG 

.2725e+00 

.2897e+00 

-.17136-01 

-.  1355e-02 

in 

,1400e+00 

•2466e+00 

•2633e+C0 

-.I673e-01 

-.1232e-C2 

15 

. 1 500e+00 

.2231e+00 

.23946+00 

-.I626e-01 

-,1120e-02 

16 

.l600e+0 0 

.2019e+00 

.2176e+C0 

-.1573e-01 

-,10l8e-02 

17 

.1700e+00 

•l827e+00 

,1978e+00 

-. 15l6e-01 

-.9257e-03 

16 

. I800e+00 

.l653e+00 

.17996+00 

-.I456e-01 

-,84l5e-03 

19 

.1900e+00 

.1496e+00 

.16356+00 

-.13946-01 

-.7650e-03 

20 

,2000e+00 

.1353e+QQ 

,1486e+00 

-.13316-01 

-.6955e-03 

21 

.2100e+00 

. 1225e+00 

.1351e+00 

-.1267e-01 

-.6323e-03 

22 

. 2200e+00 

.  1 108e+00 

. 1 228e+00 

-. 1204e-01 

-.5748e-03 

23 

,2300e+00 

•10Q3e+00 

. 1 1 17e+00 

-.1 142e-C1 

-.5225e-03 

24 

.2400e+0Q 

.9072e-01 

. 101 5e+00 

-. I08le-01 

-.4750e-03 

25 

.2500e+00 

.8208e-01 

.9230e-01 

-. 1021e-01 

-.43l8e-03 

26 

.2600e+00 

.7427e-01 

.8391e-01 

-.9632e-02 

-.3926e-03 

27 

.2700e+00 

•6721e-01 

.7628e-01 

-. 9072e-02 

-.3569e-03 

28 

.2800e+00 

.6081e-01 

.6934e-01 

-.8533e-02 

-.3244e-03 

29 

.29006+00 

.5502e-01 

.6304e-01 

-.80l6e-02 

-.2950e-03 

30 

.3000e+Q0 

.4979e-01 

.57316-01 

-.7521e-02 

-.268le-03 

31 

.3100e+00 

.4505«-Q1 

.5210e-01 

-.7049e-02 

-.2438e-03 

32 

•3200e+00 

.4076e-01 

.4736e-01 

-.6600e-02 

-,22l6e-03 

33 

•3300e+00 

•3688e-01 

.4306e-01 

-.6 17  4e-02 

-.2015e-03 

34 

.3U00e+00 

.3337e-01 

.39146-01 

-.5769e-02 

-.l831e-03 

35 

.35Q0e+C0 

•3020e-01 

.3558e-01 

-.5387e-02 

-.I665e-C3 

36 

,3600e+00 

.2732e-01 

•3235e-C1 

-.5025e-02 

-.1514e-03 

37 

.3700e+00 

.2472e-01 

.2941e-01 

-.46 85e-02 

-.1376e-03 

58 

.3800e+C0 

.2237e-01 

.2673e-C1 

-.4364e-02 

-.1251e-03 

39 

.3900e+Q0 

.2024e-01 

.2430e-01 

-.4063e-02 

-.1137e-03 

40 

.4000e+Q0 

.I832e-01 

.2209e-01 

-.3779e-02 

-.1034e-03 

41 

,41C0e+00 

.16576-01 

.2009e-01 

-.3514e-02 

-•9398e-04 

42 

,4200e+00 

.1500e-01 

.  1 826e-01 

-.3265e-02 

-.85446-04 

43 

.4300e+00 

.1357e-01 

.  I660e-01 

->3032e-02 

-.7767e-04 

44 

.44C0e+C0 

. 1228e-C  1 

.1509e-01 

-.28l4e-02 

-.706le-04 

45 

.45C0e+C0 

.11  lie-01 

.1372e-C1 

- .26 10e-02 

-.641 Se-04 
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46 

.4600e-*-00 

.  1005e-01 

.1247e-01 

47 

.47C0e+00 

.9095e-02 

.1134e-01 

48 

.4800e+00 

.8230e-02 

.10316-01 

49 

.4900e+00 

.7447e-02 

.93706-02 

50 

.5000e+00 

.6738e-02 

.8519e-02 

51 

.5100e+00 

.6097e-02 

,7744e-02 

52 

.5200e+00 

.55176-02 

.7040e-02 

53 

. 53  00e+00 

.4992e-02 

,6400e-02 

54 

.5400e+00 

.4517e-02 

.58l8e-02 

55 

■5500e+00 

.4087e-02 

.5289e-02 

56 

.5600e+00 

.3698e-02 

.48096-02 

57 

•  5700e-t-00 

,3346e-02 

.43716-02 

58 

•5800e+00 

.3028e-02 

.39746-02 

59 

,5900e+00 

.27396-02 

•36l3e-02 

60 

.6000e+00 

.2479e-02 

.3284e-02 

61 

,6l00e+00 

.2243e-02 

.2986e-02 

62 

,6200e+00 

.2029e-02 

.2714e-02 

63 

.6300e+00 

.l836e-02 

,2468e-02 

64 

,6400e-*-00 

. l662e-02 

.2243e-02 

65 

,6500e+00 

.1503e-02 

.2039e-02 

66 

,6600e+00 

. 1360e-02 

. l854e-02 

67 

.6700e+00 

.1231e-02 

.l685e-02 

68 

.6800e+00 

.1 1l4e-02 

.1532e-02 

69 

,6900e+00 

.1008e-02 

•1393e-02 

70 

,7000e+00 

.91196-03 

.1266e-02 

71 

.7100e+00 

.8251e-03 

. 1 151e-02 

72 

.7200e+00 

.7466e-03 

.1046e-02 

73 

.7300e+00 

.6755e-03 

.95136-03 

74 

.7400e+00 

•6113e-03 

,8649e-03 

75 

.7500e+00 

•5531e-03 

.7862e-03 

76 

,7600e+00 

.5005e-03 

.7l48e-03 

77 

.7700e+00 

.4528e-03 

.6498e-03 

78 

.7800e+00 

,4097e-03 

.5907e-03 

79 

,7900e+00 

.37076-03 

.5370e-03 

80 

.8000e+00 

•3355e-03 

.4882e-03 

81 

.8l00e+00 

.30356-03 

.4438e-03 

82 

.8200e+00 

.27 47e-03 

.4035e-03 

83 

.8300e+00 

.24856-03 

.3668e-03 

84 

.8400e+00 

.22 49e-03 

.3334e-03 

85 

.8500e+00 

,2035e-03 

•3031e-03 

86 

.8600e+00 

.  l841e-03 

.27  56e-03 

87 

.8700e+00 

.  l666e-03 

.2505e-03 

88 

. 8800e+00 

,1507e-03 

.2277e-03 

89 

,8900e+00 

.1364e-03 

.2070e-03 

90 

.9000e+00 

.1234e-03 

.l882e-03 

91 

.9100e+00 

. 1 1 17e-03 

-1711e-03 

92 

.9200e-*-0Q 

. 1010e-03 

.1556e-03 

93 

.9300«+00 

.9l42e-04 

. 141 4e-03 

94 

.9400e+00 

.8272e-04 

.1286e-03 

95 

.9500e+00 

.74856-04 

. 1 169e-03 

96 

.9600e+00 

.6773e-04 

.1062e-03 

97 

•9700e+00 

.6l28e-04 

.9658e-04 

98 

,9800e«-00 

.55456-04 

.8780e-04 

99 

.9900e+00 

.5017e-04 

.7982e-04 

100 

.  1000e-*-C  1 

.i»540e-04 

•7257e-C4 

-.2 420e-02 

-.5835e-04 

-.22 43e-02 

-.5305e-04 

-.2078e-02 

-.4823e-04 

-,1924e-02 

-.4384e-04 

-.178le-02 

-.3986e-04 

-.l647e-02 

-.3623e-04 

-.l524e-02 

-.3294e-04 

-.14096-02 

-.2995e-04 

-.1302e-02 

-.2722e-04 

-.1203e-02 

-.2475e-04 

-.11116-02 

-.2250e-04 

-.10256-0 2 

-.2045e-04 

-.9464e-03 

-.18596-04 

-.8733e-03 

-.16906-04 

-.8055e-03 

-.1537e-04 

-.7428e-03 

-.1 397e-04 

-.6848e-03 

-.1 270e-04 

-.6312e-03 

-.1155e-04 

-.58l6e-03 

-.1050e-04 

-.5358e-03 

-•954le-05 

-.4935e-03 

-.8674e-05 

_.4544e-03 

-.7885e-05 

-.41 84e-03 

-.7l69e-05 

-.3851e-03 

-.6517e-05 

-.3543e-03 

-.5924e-05 

-.3260e-03 

-.5386e-05 

-.2999e-03 

-.4896e-05 

-.2758e-03 

-.4451e-05 

-.2536e-03 

-,4046e-05 

-.2331e-03 

-.3679e-05 

-.2l43e-03 

-.3344e-05 

-.1969e-03 

-.3040e-05 

-.l8l0e-03 

-.2764e-05 

-.l663e-03 

-.2513e-05 

-.1527e-03 

-.2284e-05 

-.1403e-03 

-.2076e-05 

-.1 288e-03 

-.l888e-05 

-.1l83e-03 

-.17l6e-05 

-.  1086e-03 

-.1560e-05 

-.9966e-04 

-. 141 8e-05 

-.9l46e-04 

-.1289e-05 

-.8393e-04 

-.1172e-05 

-.7701e-04 

-.  1 066e-05 

-.7065e-04 

-.96 87e-06 

-.648le-04 

-. 8806e-06 

-.5944e-04 

-.8006e-06 

-.5451e-04 

-.7278e-06 

-.4999e-04 

-.66l6e-06 

-.4583e-04 

-.6015e-06 

-.4202e-04 

-.5468e-06 

-.3851e-04 

-.4971e-06 

-.3530e-04 

-.4519e-06 

-.3235e-04 

-,4108e-06 

-.2965e-04 

-.3735e-C6 

- .27 17e-C4 

-.3395e-C6 
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timestep  control  scheme,  the  corresponding  results  sre  shown  in  Table 

6.3. 

In  both  Table  6.2  and  Table  6.3,  the  first  colnan  of  data  is  the 
tiae  point,  the  second  colnan  the  exact  solution,  the  third  colnan 
the  solution  using  the  backward  Euler  foraula,  the  fourth  the  global 
error  and  the  fifth  the  estiaated  local  truncation  error.  It  can  be 
seen  that,  by  using  the  fixed  tiaestep  scheae,  the  order  of  global 
error  varies  froa  -1  to  -4,  and  the  order  of  local  truncation  error 
froa  -2  to  -6;  which  indicates  that  in  some  time  intervals  the  step- 
size  is  too  large  and  in  other  intervals  the  stepsize  is  unneces¬ 
sarily  snail.  On  the  other  hand,  by  using  the  new  timestep  control 
scheae,  the  order  of  the  global  error  and  the  local  truncation  error 
are  kept  within  the  range  froa  -2  to  -3  and  the  range  from  -3  to  -4, 
respectively. 

6.2.3.  Conclusions 

Dynanically  varying  the  timestep  is  necessary  for  the  timing 
analysis  program  to  evaluate  the  simulated  results  accurately  and 
efficiently.  To  ensure  an  accurate  transient  analysis,  the  timestep 
must  be  controlled  to  produce  an  acceptable  amount  of  local  trunca¬ 
tion  error  at  each  timepoint.  In  this  chapter,  a  new  algorithm  of 
timestep  control  is  proposed,  in  which  the  next  timestep  is  predicted 
using  the  LTE  at  the  present  timepoint  and  no  timestep  is  rejected. 
The  PREMOS  program  employs  the  Backward  Euler  integration  with  the 
LTE  timestep  control  described  in  this  chapter. 
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Table  6.3  Numerical  Solution  Obtained  by  Using  New  Timestep  Control 
Scheme  with  X"-10,  xq=i,  epa*0.01  and  epr»=0.05. 


Time 

Exact  Sol. 

Bk  Euler 

Gib  Err. 

LTE 

0 

.0Q00e+00 

.  1 000e+01 

. 1000e+01 

.0000e+G0 

.0000e+00 

1 

•5000e-02 

•9512e+00 

. 9524e+00 

-.11 52e-02 

-. 1 1 52e-02 

2 

. 1000e-01 

•9048e+00 

•9070e+00 

-.2192e-02 

-.1097e-C2 

3 

. 1 500e-01 

.860?e+00 

. 8638e+00 

-.3130e-02 

-. 1 044e-02 

4 

.2000e-01 

•8l87e+C0 

.8227e+00 

-• 3972e-02 

-.9947e-03 

5 

.2500e-01 

.7788e+00 

•7835e+00 

-.4725e-02 

_.9474e-03 

6 

. 3000e-01 

.7408e+00 

•7462e+00 

-.5397e-02 

-.9023e-03 

7 

. 3500e-01 

.7047e+00 

.7 107e+00 

-.5993e-02 

-.8593e-03 

8 

•4000e-01 

.6703e+00 

.6768e+00 

-.6519e-02 

-.8l£4e-03 

9 

.4500e-01 

.6376e+00 

.6446e+00 

-.698le-02 

-.7794e-03 

10 

. 5000e-01 

.6065e+00 

.6l39e+00 

-• 7383e-02 

-.7423e-03 

11 

. 5500e-01 

.5769e-*-00 

.584?e+00 

-.7729e-02 

-.7069e-03 

12 

.6000e-01 

.5488e+C0 

•5568e+C0 

-.8026e-02 

-.6733e-03 

13 

.6500e-01 

•5220e+00 

.5303e+00 

-.8276e-02 

-.641 2e-03 

14 

,7000e-01 

.4966e+00 

.5051e+00 

-.8483e-02 

-.6l07e-03 

15 

.7500e-01 

.4724et-00 

,48l0e+00 

-.8651e-02 

-.58l6e-03 

16 

.8000e-01 

,4493e+00 

.458le+00 

-.8783e-02 

-. 5539e-03 

17 

.8500e-01 

.4274e+00 

.4363e+00 

-.8882e-02 

-.5275e-03 

18 

.9000e-01 

.4066e+00 

.4155e+00 

-.8951e-02 

-.5024e-03 

19 

.9500e-01 

•3867e+C0 

•3957e+00 

-.8993e-02 

-.4785e-03 

20 

. 1 000e+00 

•3679e+00 

•  3769e+C0 

-.901 0e-02 

-.4557e-03 

21 

. 1050e+00 

•3499e+00 

•3569e+00 

-.9005e-02 

-.4340e-03 

22 

. 1 1 00e+00 

•3329e+00 

.34186+00 

-.8979e-02 

-.4i33e-03 

23 

. 1 1 50e+00 

.3l66e+00 

•3256e+00 

-.8935e-02 

-.3936e-03 

24 

. 1200e+Q0 

. 301 2e+00 

•3101e+00 

-.8874e-02 

-.3749e-03 

25 

. 1250e+00 

.2865e+00 

.2953e+00 

-.8798e-02 

-.3571e-03 

26 

. 1 300e+00 

.2725e+00 

.28l2e+00 

-.8709e-02 

-.3400e-03 

27 

.1350e+00 

.2592e+00 

.2678e+00 

-.8608e-02 

-.3239e-03 

28 

. 1400e+00 

•2466e+00 

.2551e+00 

-.8497e-02 

-. 3084e-03 

29 

. 1450e+00 

.2346e+00 

.2429e+00 

-.3376e-02 

-.2937e-03 

30 

.  1 500e+00 

•2231e+00 

.23l4e+Q0 

-.8247e-02 

-.2798e-03 

31 

. 1550e+00 

.21 22e+00 

.2204e+00 

-.81 12e-02 

-.2664e-03 

32 

. 1600e+00 

.2019e+00 

.2099e+00 

-.7970e-02 

-.2538e-03 

33 

.  l650e+00 

. 1920e+00 

. 1 999e+00 

-.7823e-02 

-.24l7e-03 

34 

.  1700e+00 

•l827e+00 

. 1 904e+00 

-.7671e-02 

-.2302e-03 

35 

.  1750e+C0 

•  1 738e+00 

.l8l3e+00 

-.75l6e-02 

-.2192e-03 

36 

. 1 800e+0Q 

.I653e+0Q 

.1727e+00 

-.7359e-02 

-.2088e-03 

37 

.  1 850e+00 

. 1572e+0Q 

.  I644e+00 

-.7198e-02 

-. 1 988e-03 

38 

.  1 900e+00 

. 1 496e+00 

.  1 566e+00 

-.7037e-02 

-.1 894e-03 

39 

.  1950e+C0 

.l423e+00 

.1491e+00 

-.6874e-02 

-.l803e-03 

40 

.2000e+00 

.1353e+00 

. 1 420e+00 

-.6710e-02 

-.1717e-03 
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41 

.2050e+00 

.1287e+00 

.1353e+00 

-.6547e-02 

-.l636e-03 

42 

.2100e+00 

. 1225e+00 

. 1 288e+00 

-.6383e-02 

-.1558e-03 

43 

.2150e+00 

.  1 165e+00 

.1227e+00 

-.6220e-02 

-.1484e-03 

44 

•2200e+00 

. 1 108e+0Q 

. 1 l69e+00 

-.6058e-02 

-.1413e-03 

45 

•2250e+00 

•1054e+00 

.1113e+00 

-.5897e-02 

-.13468-03 

46 

.2300e+00 

.1003e+00 

. 1060e+00 

-.5738e-02 

-.1282e-03 

47 

.2350e+00 

•9537e-01 

.1009e+00 

-.5580e-02 

-. 1221e-03 

48 

•2400e+00 

•9072e-01 

.96l4e-01 

-.5424e-02 

-.1l62e-03 

49 

.2450e+00 

.8629e-01 

.9156e-01 

-.5270e-02 

-,1107e-03 

50 

•2500e+00 

.8208e-01 

.8720e-01 

-.5119e-02 

-.1054e-03 

51 

.2550e+00 

•7808e-01 

,8305e-01 

-.4970e-02 

-. 1004e-03 

52 

.2600e+00 

•7427e-01 

.7910e-01 

-.4823e-02 

-.9564e-04 

53 

,2700e+00 

.6721e-01 

.7191e-01 

-.4700e-02 

-.3364e-03 

54 

.2800e+00 

.608le-01 

.6537e-01 

-.4559e-02 

-.3059e-03 

55 

•2900e+00 

.5502e-01 

.5943e-01 

-.4403e-02 

-.2780e-03 

56 

■3000e+00 

.4979e-01 

.5402e-01 

-.4237e-02 

-.2528e-03 

57 

. 3 1 00e+00 

.4505e-01 

,4911e-01 

-.4063e-02 

-.2298e-03 

58 

•3200e+00 

.4076e-01 

.4465e-01 

-.3886e-02 

-.2089e-03 

59 

•3300e+00 

•3688e-01 

.4059e-01 

-.3706e-02 

-.l899e-03 

60 

. 3400e+00 

•3337e-01 

.3690e-01 

-.3526e-02 

-.1726e-03 

61 

•3500e+00 

•3020e-01 

•3354e-01 

-.3347e-02 

-.1569e-03 

62 

•3600e+00 

.2732e-01 

.3050e-01 

-.3171e-02 

-.1 427e-03 

63 

•3700e+00 

.2472e-01 

.2772e-01 

-.2999e-02 

-. 1 297e-03 

64 

•3800e+00 

.22378-01 

.2520e-01 

-.2832e-02 

-.1179e-03 

65 

•3900e+00 

.2024e-01 

.2291e-01 

-.2669e-02 

-,1072e-03 

66 

.4000e+00 

.I832e-01 

.2083e-01 

-.2513e-02 

-.9745e-04 

67 

.4100e+00 

.  I657e-01 

. 1 894e-01 

-.2362e-02 

-.8859e-04 

68 

.4200e+00 

. 1 50Qe-01 

. 1721e-01 

-,22l8e-02 

-.8054e-04 

69 

.4300e+00 

.1357e-01 

.  1565e-01 

-.2080e-02 

-.7322e-04 

70 

.4400e+00 

.1228e-01 

.I423e-01 

-.1949e-02 

-.6656e-04 

71 

,4500e+00 

.  1 1 1 1e-01 

•1293e-01 

-,l824e-02 

-.6051 e-04 

72 

.4?00e+00 

•9095e-02 

. I078e-01 

-. l682e-02 

-. l889e-03 

73 

,4900e+00 

•7447e-02 

.898le-02 

-. 1 535e-Q2 

-. 1 574e-03 

74 

•5100e+00 

.6097e-02 

•7484e-02 

-. 1 388e-02 

—.1311 e— 03 

75 

.5300e+00 

.4992e-02 

.6237e-02 

-.1245e-02 

-.1093e-03 

76 

•5500e+00 

.4087e-02 

•5197e-02 

-.11 1 1e-02 

-.9108e-04 

77 

•5700e+00 

•3346e-02 

.4331e-02 

-.9852e-03 

-.7590e-04 

78 

.61 OOe+QO 

.2243e-02 

.3094e-02 

-.8508e-03 

-.1 904e-03 

79 

•6500e+00 

.1503e-02 

. 2210e-02 

-.7064e-03 

-.1360e-03 

80 

•7300e+00 

•6755e-03 

.  1 228e-02 

-.5521e-03 

-.2347e-03 

81 

.81 OOe+QO 

.3035e-03 

.6820e-03 

-.3T85e-03 

-. 1304e-03 

82 

•9700e+00 

.6 128e-04 

.2623e-03 

-.2010e-03 

-.1246e-03 

83 

. 1000e+01 

.4540e-04 

.201 8e-03 

-. 1 564e-03 

-.7453e-05 
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CHAPTER  7 

The  PREMOS  Program 


1.1.  Ictrodnction 

PREMOS  (PREdiction-based  simulator  for  MOS  circuits)  is  an 
experimental  simulator  program  for  VLSI  MOS  digital  circuits.  The 
object  of  the  program  is  to  close  the  gap  between  conventional  cir¬ 
cuit  simulation  and  logic  simulation.  This  program  is  faster  than 
conventional  circuit  simulators  because  it  uses  a  Gauss-Seidel  cir¬ 
cuit  simulation  scheme  and  employs  built-in  models  for  the  subcir¬ 
cuits.  Although  it  is  slower  than  logic  simulators,  it  generates 
more  accurate  electrical  waveforms  than  the  logic  levels  produced  by 
logic  simulators. 

In  PREMOS,  a  modified  block  Gauss-Seidel -Newton  algorithm  is 
used  instead  of  the  standard  point  Gauss-Jacobi  algorithm  used  in 
MOTTS  and  MOTIS-C.  Compared  with  MOTIS  and  MOTIS-C,  the  accuracy  of 
the  results  is  improved  at  three  levels:  (1)  circuit  analysis  is  used 
to  solve  the  unilateral  subcircuit  equations.  (2)  multi-nonlinear 
iteration  is  used  at  each  time  point,  and  (3)  the  predictor  method  is 
used  for  solving  the  feedback  interdependence.  In  addition,  an 
analysis  sequencing  algorithm  based  on  relevant  parts  is  used  to 
improve  the  speed  of  the  simulation.  Because  of  the  additional 
iterations,  PREMOS  is  generally  about  five  times  slower  than  that  of 
MOTIS-C,  whereas  the  speed  and  circuit-size  capability  of  MOTIS-C 
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have  been  claimed  to  be  over  two  orders  of  magnitnde  greater  than 
those  of  SPICE2  [7] . 

PREMOS  evolves  from  MOTIS-C  bat  has  different  data  structures 
and  nev  analysis  algorithms.  It  is  written  in  FORTRAN  and  contains 
more  than  3000  statements  at  the  present  time. 

2.2.  The  Incut  Processing 

The  input  processor  reads  and  processes  the  model  file  of  NOS 
devices,  and  the  input  data  file  containing  the  description  of  cir¬ 
cuit  elements  and  control  statements.  There  are  a  number  of  built-in 
models  for  the  partitioned  subcircuits  in  this  program.  A  list  of 
circuit  elements  and  their  corresponding  models  is  shown  in  Appendix 
2.  The  control  statements  are  listed  in  Appendix  3.  The  input  pro¬ 
cessor  constructs  the  internal  node  table  and  the  linked  lists  for 

the  structure  of  the  circuit.  The  data  structures  for  subcircuit 
models  are  shown  in  Appendix  4.  The  data  generated  are  passed  to  the 

analysis  part  of  the  program  through  disc  files. 

The  subroutine  EROR  performs  error  checking  when  input  data  are 
read.  If  an  error  exists,  the  program  stops  with  the  error  messages 


printed  out. 


2.2.  The  Analysis  Core 


The  analysis  core  of  the  prograa  includes  two  phases:  analysis 
sequencing  phase  and  analysis  phase.  They  are  described  respectively 
in  this  section. 

2.2.2.  Analysis  Sequencing  Phase 

Before  the  transient  analysis  is  perforated,  the  sequence  of 
analyzing  the  subcircuits  is  generated  in  this  phase.  First,  the 
linked  list  of  the  corresponding  directed  graph  is  constructed. 
Algoritha  5  aentioned  in  Chapter  3  is  then  executed  to  provide  the 
analysis  sequence.  During  the  sequencing  procedures,  the  feedback 
paths  and  the  floating  capacitors  are  identified.  Finally,  the  sub- 
circuits  that  do  not  belong  to  the  relevant  set  are  deleted  fron  the 
sequence . 


2-2-2.  Aniiya.i?  Ph 

Following  the  analysis  sequencing,  each  scheduled  subcircuit  is 
identified  and  linked  to  its  aodels.  Initially,  device  sizes  and 
node  tables  are  read  out  by  neans  of  the  pointers  in  the  aodel  nap. 
If  feedback  paths  or  floating  capacitors  exist,  the  associated  node 
voltages  are  predicted.  If  a  subcircuit  has  been  declared  latent  at 
a  previous  tinepoint  and  its  fan-in  node  voltage  changes  are  less 
than  some  certain  liait,  the  subcircuit  is  bypassed  during  the 
analysis.  At  each  nonlinear  iteration,  the  device  aodels  are 


evaluated  and  the  subcircuit  aatrix  is  foraed.  In  solving  the 
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matrix,  no  sparse  matrix  technique  and  no  reordering  scheme  in  the  LD 
factorization  process  are  nsed  since  the  size  of  the  subcircuit  is 
usually  small.  The  number  of  nonlinear  iterations  is  specified  by 
the  user.  The  timestep  can  be  either  controlled  by  local  truncation 
error  as  described  in  Chapter  6  or  fixed,  depending  on  the  user's 
option. 


2.1.  2M  Output  Processing 

The  output  from  the  program  is  written  in  a  disc  file,  which 
could  be  read  by  the  output  processor.  The  output  data  is  sent  to 
either  the  line  printer  or  the  plotting  terminal  through  the  output 
processor.  Hard  copies  of  the  plots  of  the  waveforms  selected  car 
also  be  produced  on  an  X-Y  plotter. 


In  this  section,  four  examples  of  circuit  simulations  using 
PKEMOS  are  presented.  Example  1  illustrates  the  timing  analysis  of  a 
PLA  circuit.  The  input  data  file  is  also  included.  Example  2  and 
Example  3  are  given  to  show  the  improvements  in  the  accuracy  of  the 
simulated  results  by  using  the  predictor  method.  The  comparisons 
with  SPICE2  and  MOTIS-C  are  also  included  in  these  two  examples. 
Example  4  shows  the  effect  of  analyzing  only  the  relevant  parts  on 


reducing  the  simulation  time. 
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l.£‘l>  ELA  ZLzssdl 

The  circuit  diagram  of  a  programmable  logic  array  (FLA),  which 
is  used  to  implement  a  traffic  light  controller,  ia  shown  in  Fig.  7.1 
[39].  This  PLA  circuit  which  is  composed  of  abont  150  NOS  transis¬ 
tors  can  be  partitioned  into  42  unilateral  subcircuits  or  circuit 
elements.  The  input  data  file  used  for  circuit  simulation  by  PREMOS, 
is  included  in  Appendix  5. 

The  input  and  output  waveforms  for  this  example  simulated  by 
PREMOS  are  shown  in  Fig.  7.2.  The  total  analysis  time  is  12.450 
seconds,  compared  with  5.567  seconds  taken  by  MOTIS-C.  Table  7.1, 
which  can  be  found  in  [39],  is  given  to  verify  the  simulated  results. 

2 •£•!•  BggU.Utp  Ciisnil 

The  bootstrap  capacitor  circuit  in  Fig.  7.3  has  become  very 
popular  in  MOS  digital  circuit  design  for  fast  switching  operations 
and  large  driving  capability.  The  simulated  results  are  shown  in 
Fig.  7.4,  where  it  can  be  seen  that  the  modified  Gauss-Seidel  method 
used  in  PREMOS  is  more  accurate  than  the  standard  Gauss-Seidel  method 
(represented  by  PREMOS  without  predictor)  and  the  Gauss-Jaoobi  method 
used  in  MOTIS-C.  In  this  comparison  the  exact  solution  is  what 
SPICE2  produced. 

In  this  example,  the  analysis  time  is  17.60  seconds  for  SPICE2, 
1.017  seconds  for  PREMOS  with  predictor,  1.200  seconds  for  PREMOS 
without  predictor  and  0.467  seconds  for  MOTIS-C.  Both  PREMOS  and 
MOTIS-C  use  a  fixed  timestep  scheme  with  a  0.5  ns  timestep.  PREMOS 


Pi*.  7.2  (a)  Voltage  laveforas ^ i .  ^2  »  TL,  TS,  Yq,  and  Y. 
for  the  PLA  Circuit,  in  Fi*.  7.1. 


Table  7.1  Encoded  State  Transition  Table  for  the  Light  Controller. 


Stored  during  V',  in  In~register 

Stored  during  i in  Out-register 

Inputs 

Present 

state 

Next 

state 

Outputs  Product 

terms 

C  TL  TS 

n*.  r,„ 

r.o.Y,, 

ST  HL„  HL,  FL„  FL, 

I 

O.O(HG) 

O.O(HG) 

O.O(HG) 

0.  1  ( HYi 

0.  1  (HYi 

1.  1  (FGi 

1.  ItFG) 

1.  1  iFG) 

I.O(FY) 

0.  0  (HG> 

O.O(HG) 

0.  1  (HY) 

0.  !  (HY) 

1.  1  IFG) 

1.  1  (FGt 

I.O(FY) 

I.OiFYi 

I.O(FY) 

0  0  0  10  R, 

0  0  0  10  R, 

10  0  10  R, 

0  0  110  R. 

10  110  R, 

0  10  0  0  R* 

110  0  0  R: 

110  0  0  R. 

0  I  0  0  1  R„ 

O.O(HG) 
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uses  three  nonlinear  iterations  at  each  timepoi nt. 

1 .4.1,.  One-Bit  Register 

The  block  diagram  and  circuit  schematic  of  a  one-bit  register 
circuit  are  shown  in  Fig.  7.5.  This  design  has  a  memory  function.  A 
feedback  path  exists  from  the  output  of  S2  to  the  input  of  SI.  As 
shown  in  Fig.  7.6,  by  comparing  the  results  to  SPICE2  results,  it  can 
be  seen  that  PKEMOS  with  predictor  produces  more  accurate  results 
than  both  PREMOS  without  predictor  and  MOTXS-C. 

In  this  example,  the  analysis  time  is  19.05  seconds  for  SPICE2, 
1.267  seconds  for  PREMOS  with  predictor,  1.000  seconds  for  PREMOS 
without  predictor  and  0.433  seconds  for  MOTIS-C.  Both  PREMOS  and 
MOTIS-C  employ  a  fixed  timestep  of  0.5  nanoseconds  in  the  transient 
analysis.  In  PREMOS  three  nonlinear  iterations  are  used  at  each 
timepoint . 

1-1 -A ‘  Binarv-to-Octal  Decoder 

A  block  diagram  of  the  binary-to-octal  decoder  circuit  is  shown 
in  Fig.  7.7.  In  this  example  only  the  subcircuits  that  directly  or 
indirectly  affect  the  output  are  analyzed  during  the  solution  pro¬ 
cess.  The  analysis  results  of  selecting  one,  two,  four  or  eight  out¬ 
puts  are  shown  in  Table  7.2.  Because  of  the  overhead  involved  in  the 
simulation,  the  total  CPC  time  taken  in  each  case  is  not  in  propor¬ 


tion  to  the  number  of  relevant  subcircuits. 
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CHAPTER  8 


Conclusions 

The  aia  of  l>r|e-scale  circuit  simulation  is  to  bridge  the  gap 
between  conventional  circuit  simulation  and  logic  siaiulation.  In  the 
experimental  program  PKENOS  developed  as  part  of  this  dissertation, 
the  subcircuits  are  analyzed  at  the  transistor  level  by  using 
Newton's  method  as  is  done  in  conventional  circuit  simulation;  but 
the  signal  propagation  from  subcircuit  to  subcircuit,  which  deter¬ 
mines  the  analysis  sequence  of  these  unilateral  subcircuits,  is  simi- 
liar  to  that  used  in  logic  simulation.  The  transistor  level  simula¬ 
tion  in  the  subcircuits  provides  the  detailed  waveforms.  The 
analysis  sequencing  combined  with  latency  checking  reduces  signifi¬ 
cantly  the  amount  of  computation  time  and  memory  requirements. 

The  analysis  sequencing  procedure  which  includes  checking  and 
identifying  feedback  loops  has  been  presented  in  Chapter  3.  The  pro¬ 
cedure  schedules  only  those  subcircuits  that  directly  or  indirectly 
affect  the  output.  Combined  with  latency  checking,  this  'segmenta¬ 
tion'  approach  achieves  a  further  increase  in  speed.  The  amount  of 
increase  depends  on  the  circuit  being  analyzed. 

In  Chapter  4  we  discuss  initial  DC  analysis  in  large-scale  cir¬ 
cuit  simulation  and  compare  the  different  algorithms  using  2-element 
and  3-element  companion  models  for  the  MOS  transistors.  It  is  found 
that  the  2-element  model  of  the  NOS  transistor  is  suitable  for 
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large-scale  circuit  simulation  from  tlie  point  of  convergence  rate. 
The  analysis  resalts  also  show  that  using  a  small  fixed  number  of 
iterations  produces  DC  solutions  close  enough  to  those  obtained  after 
much  more  iterations,  provided  the  initial  guess  is  a  good  approxima¬ 
tion. 

In  using  the  standard  Gauss-Seidel  method  for  solving  the  parti¬ 
tioned  circuit,  the  feedback  loop  is  decoupled  by  assuming  that  there 
is  no  change  in  the  feedback  loop  over  the  integration  timestep.  In 
this  thesis,  a  'modified'  Gauss-Seidel  method  is  proposed,  where 
explicit  formulas  are  used  to  predict  the  'unsolved '  variables  in  the 
feedback  loops.  As  a  result,  the  accuracy  is  improved  without 
requiring  much  additional  computation.  It  has  been  shown  that  the 
method  is  consistent,  stable  and  convergent.  It  has  also  been  shown 
that  no  parasitic  oscillatory  component  appears  in  the  solution  if 
the  timestep  is  smaller  than  a  critical  timestep. 

As  the  entire  circuit  is  partitioned  into  'one-way'  subcircuits, 
which  can  be  easily  identified  as  an  'event'  during  the  simulation, 
latency  detection  and  exploitation  is  used  to  provide  additional  com¬ 
putational  savings.  The  latency  criterion  for  PREMOS  is  described  in 
Chapter  6.  In  the  same  chapter,  a  timestep  control  scheme  based  on 
the  local  truncation  error  is  discussed.  It  should  be  noted  that  the 
proposed  timestep  control  scheme  does  not  reject  the  present  timestep 
even  when  the  LTE  bound  is  exceeded. 

The  program  PREMOS  is  described  ir.  some  detail  in  Chapter  7.  It 
is  written  for  use  on  VAX  780/11  UNIX  operating  system.  Several 
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simulation  examples  are  given  to  show  the  validity  of  using  the  new 
algorithms  and  schemes. 

PKEMOS  coaid  be  ased  for  the  timing  analysis  of  NOS  integrated 
circuits  in  hierarchical  design.  For  general  purpose  usage,  more 
enhancements  on  the  program  must  be  done.  The  input  processor  should 
be  able  to  expand  the  macro  or  nested  subcircuits.  Furthermore,  the 
capability  of  partitioning  the  circuit  automatically  should  be  added. 
In  this  way,  the  program  could  be  used  for  verifying  the  circuit 
extracted  from  the  layout.  For  other  IC  technologies  like  CMOS  and 

•y 

I  L,  which  could  have  unilateral  gates  (subcircuits)  formed  easily, 
the  analysis  techniques  used  in  P RENOS  can  be  applied  to  develop 
similiar  kinds  of  simulators. 

As  described  in  [23],  the  speed  of  logic  simulation  could  be 
increased  more  than  several  hundred  times  by  using  logic  processors 
and  array  processors.  Similiarly,  it  could  be  possible  to  implement 
decomposition  algorithms  in  hardware  and  analyze  'one-way'  subcir- 
cuits  by  using  multi-processors  to  gain  orders  of  magnitude  in  execu¬ 
tion  time  for  the  next  generation  circuit  simulators  [12].  Further 
research  on  this  kind  of  simulation  machine  could  be  promising  in  the 


future 
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Appendix  1 

HOS  Device  Modeling  end  Capacitor  Modeling 


In  conventional  circnit  analysis,  the  MOS  device  model  shown  in 
Fig.  Al.l(a)  is  generally  need  (the  charge  storage  effects  are  not 
shown  here).  The  three  values  of  g-#  gdj  ,nd  i0  are  calculated  for 
each  DC  iteration.  The  substrate  bias  effect  is  included  into  the 
changes  of  the  threshold  voltages.  For  large-scale  circuit  analysis, 
the  MOS  device  aodel  for  the  enhanceaent  transistor  could  be  simpli¬ 
fied  further  as  shown  in  Fig.  Al.l(b).  Only  two  values,  gds  and  iQ, 
need  to  be  calculated  at  each  DC  iteration. 

Recently,  short-chanuel  MOS  devices  are  becoming  widely  used  and 
"second -order"  effects  such  as  nobility  reduction,  channel  length 
modulation  and  substrate  bias  effects  are  becoming  increasingly 
important  in  deriving  device  models.  In  order  to  obtain  accurate 
simulated  results,  these  effects  should  be  taken  into  account.  The 
MOS  device  modeling  work  in  PREMOS  is  based  on  curve-fitting  empiri¬ 
cal  curves  of  DC  characteristics,  with  the  emphasis  on  matching  the 
saturated  points  and  the  conductances  in  the  saturated  regions  for 

different  V_c. 

G5 

The  modeling  equations  are 
i  +  XV 

'DS  •  "  *  '  <<V(iS  '  'VV°S  ‘  1 -  VDS>  «“•” 

for  operation  in  linear  region 


L_ 


137 


DS 


1  +  XVnQ 

jp  *  - -  PS 


_  _ _  *  (V  -  V 

i  +  n<vrc-  v_.)  '  vgs  V 


GS  ’T 

for  operation  in  saturation  region  and 


(A1.2) 


TO 


+  AV„ 


(A1.3) 


where 

KP  =  intrinsic  transconductance  (■  c-W/L.**) 

O  O  ©  X  X 

X  =  channel  length  modulation  parameter 
i]  »  mobility  reduction  parameter 

=»  threshold  voltages  where  the  DC  curves  are  measured 
AV_  «  threshold  voltage  change  from  V^.q  due  to  substrate  bias 
voltage  change 


AV.J.  is  represented  in  tabular  form  as  a  function  of  the  source-to- 
substrate  voltage  VSB. 

The  extraction  of  the  DC  model  parameters  from  the  physical  dev¬ 
ice  can  be  done  by  using  curve-fitting  techniques  in  a  straightfor¬ 
ward  manner,  A  special  computer  program  can  be  developed  for  this 
purpose . 

The  capacitance  at  each  node  in  LSI  or  VLSI  circuits  consists  of 
two  types  (1)  voltage-dependent  capacitance  formed  by  MOS  devices, 
which  includes  gate  capacitance  and  diffusion  capacitance  and  (2)  the 
interconnect  capacitance.  In  the  scaling  down  technology,  the  latter 
plays  an  increasingly  important  role  in  the  circuit  behavior.  The 
features  and  modeling  of  these  capacitances  are  described  below: 

(1)  The  voltage-dependent  capacitance:  gate  capacitance  and  diffu¬ 


sion  capacitance. 
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The  voltage-dependent  relation  of  gate  capacitance  for  NOS 
transistor  is  shown  in  Fig.  A1.2  [40].  For  the  diffusion  capacitance 
c,  the  voltage-dependent  relation  can  be  expressed  as 

c  =  - -i£ -  (A1.4) 

<1  -  v/vbi)e 

where 

CjQ  m  diffusion  capacitance  at  zero  junction  voltage 
v  =  junction  voltage 

Vbi  “  junction  contact  potential 
e  =  grading  constant 

In  large-scale  circuit  simulation,  it  is  rather  expensive  to  calcu¬ 
late  these  voltage-dependent  capacitances  at  every  iteration.  So 
these  capacitances  will  be  luaiped  together  and  approximated  by  a 
linear  capacitance.  The  value  of  this  linear  capacitance  depends  on 
the  device  size  and  on  the  processing  parameters.  This  value  could 
be  experimentally  determined  as  a  function  of  device  size  and  gate 
oxide  capacitance. 

(2)  Modeling  of  the  interconnection  capacitance. 

Poly  and  metal  capacitances  are  usually  calculated  by  applying 
the  parallel-plate  formula.  However,  present  scaling-down  technology 
produces  interconnection  conductors  that  are  comparable  in  dimension 
to  the  thickness  of  the  oxide,  so  the  parasitic  capacitance  of  vari¬ 
ous  interconnections  can  no  longer  be  treated  as  capacitance  of 
infinite  parallel  plates  because  of  fringing  field  effects.  For 
today’s  interconnection  system,  an  error  of  a  factor  of  two  could 


result  due  to  fringing  fields  alone  I  It  is  necessary  then  to  correct 
the  capacitance  from  the  parallel-plate  formula  by  a  correction  fac¬ 
tor.  The  correction  factor  e/cfl  could  be  evaluated  either  experimen¬ 
tally  with  test  chips  or  theorectically,  where 

c  =  the  interconnect  wiring  parasitic  capacitance  per  unit 
length 

cq  =  the  capacitance  per  unit  length  from  the  parallel-plate 
formula  [41.  42]. 

Thus,  the  aiagnitude  of  the  interconnection  capacitance  cintcon  can  be 
obtained  from 

c.  *  c  •  L  •  (Correction  Factor)  (A1.5) 

intcon  o 
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Appendix  2 

Inpat  Descriptions  for  Circait  Elements  and  Their  Models 


The  following  types  of  subcircuit  models  have  been  implemented 
in  the  program  PREMOS: 

NAND2:  2-input  NAND  gate  (Fig.  A2.1) 

NOR2  :  2-input  NOR  gate  (Fig.  A2.2) 

ANDO I :  n-input  AND-OR-Inverter  (Fig.  A2.3) 

ORANI:  n-input  OR-AND-Inverter  (Fig.  A2.4) 

TRANS:  n-input  NOR  gate  with  nt  Transfer  Gates  (Fig.  A2.5) 

TRANP:  same  as  TRANS  except  node  tnt  is  taken  as  input 
PUSPL:  Push-Pull  Inverter  (Fig.  A2.6) 

LATCH:  Latch  Gate  (Fig.  A2.7) 

SODRC:  Clock  (Voltage  Source)  Model 

Model  is  described  as 

MODEL  (mdnam)  (type)  (parameters) 
for  example, 

MODEL  ml  NAND 2  (1  0.2  lOf  20f  lOOf) 

MODEL  is  the  keyword  for  model  description,  mdnam  is  a  user- 
defined  name  for  the  model.  The  type  field  specifies  the  model  type. 
The  available  model  types  and  their  associated  model  parameters  are 
listed  in  the  following  table: 

TYPE  PARAMETERS 

NAND 2  wla  wll  ca  ci  cl 
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i 


l 


i 

! 


I 


Fig.  A2.1  2-Input  NAND  Gate. 
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Fig.  A2.6  Push-Pull  Inverter 
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N0R2  vlo  vll  co  cl 

ANDOI  via  vlo  vll  ca  co  ci  cl  na  no 

QUASI  vlo  via  vll  co  ca  ci  cl  no  na 

TRANS  vlo  vll  vlt  co  cl  eg  ct  no  nt 

TRANP  vlo  vll  vlt  co  cl  eg  ct  no  nt 

PUSPL  via  vll  cal  ca2  cl 

LATCH  via  vll  ca  cl 

SOURC  vl  vO  tO  tr  tl  tf 

The  circuit  is  described  as 
(naae)  (nodes)  (adnaa) 
for  exaaple, 

N1  1  2  3  4  NAND2 

"naae"  is  the  naae  of  the  circuit  eleaent.  The  nodes  field  con¬ 
tains  the  node  nuabers  describing  the  circuit  connection.  The  order 
of  the  node  nuabers  for  each  type  of  the  subcircuit  is  listed  in  the 
table  belov: 


TYPE 

ORDER 

OF  NODE  NUMBERS 

NAND2 

al 

a2 

i  1 

N0R2 

ol 

o2 

1 

ANDOI 

al 

.2 

•  e  e  Ol  02  •  •  e  1 

il  i2  ...  i(na-l) 

ORAN  I 

ol 

o2 

•  e  e  ll  l2  •  e  e  1 

11  12  ...  ina 

TRANS 

ol 

o2 

•  e  e  1  |1  ll  see 

gnt  tnt 

TRANP 

ol 

o2 

e  e  a  1  £  1  ll  •  •  • 

gnt  tnt 

PCS  PL 

al 

a2 

1 

LATCH 


al  a2  11  12 
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Appendix  3 

Control  Commends  Used  in  Experimental  Program  PREMOS 

1.  TIME  : 

General  Form  TIME  Tstop  Tstep  Dtmin 
Tstop  :  the  length  of  the  analysis 
Tstep  :  ontpnt  print  step 
Dtmin  :  minimum  internal  timestep 

2.  PRESET  : 

General  Form  PRESET  (nl.vl)  (n2,v2)  ... 

nl,n2,...  :  node  number 

vl « v2 , . . .  :  preset  node  voltr ge 

3.  PLOT  : 

General  Form  PLOT  nl  n2  n3  ... 

4.  SEND  : 

General  Form  SEND  nl  a2  n3  ... 

The  SEND  command  allows  the  user  to  generate  the  data  file 
plfile.dat  which  contains  the  analysis  results  on  node  nl. 
n2  ...  .  The  file  plfile.dat  is  used  as  the  input  data  for 
the  graphing  program  graph. f. 

5.  global  elements  : 

General  Form  (type)  (value) 

The  global  elements  are 

(i)  v+,  the  drain-  or  the  load-end  supply  source 

(ii)  v-,  the  source-  or  the  driver-end  supply  source 


(ill)  vbg,  the  back  gate  supply  voltage  source 
END  : 

General  Fora  END 
DC  : 

General  Forai  DC 

DC  is  the  coaaand  requiring  dc  analysis. 

CONTL  : 

General  Fora  CONTL  laten  ltstp  lpred 

laten  :  flag  of  having  latency  scheae  or  not;  1  (yes),  0  (no) 

ltstp  :  flag  of  having  tiaestep  control  scheae  or  not; 

1  (yes),  0  (no) 

lpred  :  flag  of  having  predictor  scheae  or  not; 

1  (yes),  0  (no) 

OPT  : 

General  Fora  OPT  itnan  itnor  ittrs  itpul  itlch  itao  itoa 


itnan 

:  preset 

nunber 

of 

dc 

iterations 

for 

NAND2 

itnor 

:  preset 

nuaber 

of 

dc 

iterations 

for 

NOR2 

ittrs 

:  preset 

number 

of 

dc 

iterations 

for 

TRANS 

itpul 

:  preset 

number 

of 

dc 

iterations 

for 

PUS  PL 

itlch 

:  preset 

number 

of 

dc 

iterations 

for 

LATCH 

itao 

:  preset 

number 

of 

dc 

iterations 

for 

ANDOR 

itoa 

:  preset 

number 

of 

dc 

iterations 

for 

ORAND 
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Appendix  4 

Analysis  Oats  Structures  for  Subcircuit  Models 


The  internal  data  structures  representing  the  model  of  the 
circuit  look  like  the  following  : 


LOC(ISUB) 


LINK 


Number  of  Elements  in  This  Field 


Model  Type 


Pointer  to  Vidth  List 


Pointer  to  Node  List 


Feedback  Node 


Pointer  to  Floating  Capacitor  List 


Others 


The  data  structures  for  different  types  of  subcircuits 
listed  below: 


NAND2 

LOC  +0:  6 
+1:  1 
+2:  INAN2 
+3:  IRAN2 

+4:  0  or  node  number 
+5:  0  or  IFCAP 


WNAN2 

INAN2  +0:  w/1  (driver) 
+1:  w/1  (load) 


NAND2 


sub- 


are 


IRAN 2  +0:  1st  input  node 


+1 :  2nd  input  node 
42:  internal  node 
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NOK2 

LOC  +0:  6 
+1:  2 
+2:  INOR2 
43:  IR0R2 

+4:  0  or  node  nuaiber 
+5 :  0  or  IFCAP 


AN  DO  I 

LOC  +0:  6 
+1:  7 
+2:  IRAND 
+3:  I AND 

-t-4:  0  or  node  numbe 
45:  0  or  IFCAP 


43:  output  node 


WN0R2 

IN0R2  +0:  w/1  (driver) 

41:  w/1  (load) 

N0R2 

IR0R2  40:  1st  input  node 
41 :  2nd  input  node 
42:  output  node 


WAND 

IRAND  40:  w/1  (driver  of  AND) 
4l:  w/1  (driver  of  OR) 
42:  w/1  (load) 

NAND 
I AND  +0:  na 
4l :  no 


42  —  4(na4l):  al  —  ana 
4(na42)  —  4(na+no4l) : 
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ORAN  I 


TRANS 


1 


ol  —  ono 

+(na+no+2):  output  node 
+(na+no+3)  —  +(na+no+na+l) : 

11  —  i(na-l) 


LOC  +0: 

6 

+1: 

8 

VOR 

+2: 

IROR 

IROR  +0 : 

w/l 

(driver 

of 

OR) 

+3: 

IOR 

+1: 

w/ 1 

(driver 

of 

AND) 

+4: 

0  or 

node  nuiber 

+2: 

w/l 

(load) 

+5: 

0  or 

IFCAP 

NOR 

10R  +0:  no 

+1 :  na 

+2  —  +(no+l) :  ol  —  ono 
+(no+2)  —  +(no+na+l): 

al  —  ana 

+(no+na+2):  output  node 
+(no+na+3)  —  +(no+2na+2): 

il  —  ina 

I 

WTFR 

I 

ZRTFR  +0:  w/l  (driver) 


LOC  +0:  6 
+1:  4 
+2:  ZRTFR 


♦3:  ITRF 


♦1 :  w/l  (load) 
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+4:  0  or  node  nuber  +2:  w/1  (transfer  gate) 

+5:  0  or  IFCAP 

NTFR 

ITRF  +0:  no 


PDSPL 


LOC  +0: 

6 

+1 : 

5 

+2: 

IWPUL 

+3: 

INPUL 

+4: 

0  or  node  number 

+5: 

0  or  IFCAP 

+1:  nt 

+2  —  +(jor+l):  ol  —  ono 
+(jor+2) :  load  node  1 
+  (jor+3):  gate  node  gl 
+(jor+4):  source  node  tl 

+(jor+2nt+l) :  gate  node  gnt 
+( jor+2nt+2) :  output  node  tnt 


WPDL 

I¥PUL  +0:  w/1  (driver) 

+1:  w/1  (load) 

NPUL 

NPUL  +0:  driver  gate  node 
+1 :  load  gate  node 


+2:  output  node 
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LATCH 

LOC  +0:  6 


+1: 

6 

VLH 

+2: 

rWLCH 

DfLCH 

+0:  w/1 

(driver) 

+3: 

ILCH 

+1:  w/1 

(load) 

+4: 

0  or  node  nnaiber 

+5: 

0  or  IFCAP 

NLH 

ILCH 

+0:  1st 

gate  node 

+1 :  2nd 

gate  node 

+2:  1st 

ontpat  node 

+3 :  2nd 

output  node 

SOCRC 


LOC  +0:  4 

+1:  3 

NSOR 

+2:  IVSC 

IVSC  +0:  node  number 

+3:  IRVSC 

+1:  0 

+2:  1 

WSOR 

IRVSC  +0: 

+1:  TIME 
+2:  VHIG 

For  floating  capacitors,  the  data  stractnres  are: 
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CAJPCR 

NCAP 

IFCAP  +0:  Node  1 
+1:  Node  2 

CCAP 

IFCAP  +0 :  capacitor  value 

The  order  of  node  1  and  node  2  must  coincide  with  the  sequence  of 
analysis  so  that  the  modified  Gauss-Seidel  method  can  be  applied. 
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Appendix  5 


Inpat  Date  File  for  Hie  PLA  Example 


The  inpat  data  file  for  simulating  the  PLA  circuit  in  Fig.  7.1 
by  PREXOS  is  contained  in  this  appendix. 


PLA  finite-state  machine  implementing  the  light  controller 

*snbcircuit  model  card 

model  inv  nor 2  (5  1  lOf  lOOf) 

model  nor3  andoi(5  5  1  lOf  lOf  lOf  lOOf  0  3) 

model  nor 4  andoi(5  5  1  lOf  lOf  lOf  lOOf  0  4) 

model  notrl  trans(5  1  2  lOf  lOOf  lOf  50f  1  1) 

model  notr2  trans(5  1  2  lOf  lOOf  lOf  50f  2  1) 

model  notr4  trans(5  1  2  lOf  lOOf  lOf  50f  4  1) 

model  notr5  trans(5  1  2  lOf  lOOf  lOf  50f  5  1) 

model  clkl  source  (4  1  lOn  5n  lOn  5n) 

model  clk2  source  (5  0  5n  5n  Sn  5n) 

*  AND  plane 

xl  11  17  19  1  nor3 
x2  13  17  19  2  nor3 
x3  12  14  17  19  3  nor 4 
x4  15  18  19  4  nor3 
i5  16  18  19  5  nor 3 
x6  12  13  18  20  6  nor4 
x7  11  18  20  7  nor3 
x8  14  18  20  8  nor3 
x9  15  17  20  9  nor3 
xlO  16  17  20  10  nor3 

*  OR  plane 

xll  5  6  7  8  9  21  56  28  notr5 

xl2  3  4  5  6  22  56  29  notr4 

xl3  3  5  7  8  10  23  56  30  notr5 

xl4  6  7  8  9  10  24  56  31  notr5 

xl5  4  5  25  56  32  notr2 
xl6  1  2  3  4  5  26  56  33  notr5 
xl7  9  10  27  56  34  notr2 

*  output  registers 
xl8  28  35  55  49  notrl 
xl9  29  36  55  48  notrl 
x20  30  30  37  inv 

x21  31  31  38  inv 
x22  32  32  39  inv 
x23  33  33  40  inv 
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x24 

34 

34 

41 

inv 

•  inpnt 

buffers 

x25 

57 

42 

55 

45  notrl 

x26 

58 

43 

55 

46  notrl 

x27 

59 

44 

55 

47  notrl 

•  input 

:  registers 

x28 

45 

45 

50 

inv 

x29 

46 

46 

51 

inv 

x30 

47 

47 

52 

inv 

x31 

48 

48 

53 

inv 

x32 

49 

49 

54 

inv 

x33 

50 

50 

11 

inv 

x34 

45 

45 

12 

inv 

x35 

51 

51 

13 

inv 

x36 

46 

46 

14 

inv 

x37 

52 

52 

15 

inv 

x38 

47 

47 

16 

inv 

x40 

53 

53 

17 

inv 

x41 

48 

48 

18 

inv 

x42 

54 

54 

19 

inv 

x43 

49 

49 

20 

inv 

•input  sources 

v*l  55  0  clkl  01000100010001 

va2  56  0  clkl  00010001000100 

vsO  57  0  clk2  1111000000111111 

vbO  58  0  clk2  1111000000000000 

vcO  59  0  clk2  1111111111111111 

•analysis  requests 

opt  1  1  3  1  1  1  1 

contl  101 

preset  (35,5)  (36,5) 

time  120n  In 

plot  55  56  42  43  44  35  36 
plot  37  38  39  40  41  9  10 
plot  12345678 
send  55  56  42  43  44  35  36 
send  7  9  37  38  39  40  41 
v+  5 
end 
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